SuperSuit: Simple Microwrappers for Reinforcement Learning Environments
aa r X i v : . [ c s . L G ] A ug SuperSuit: Simple Microwrappers for ReinforcementLearning Environments
Justin K. Terry
Department of Computer ScienceUniversity of Maryland, College ParkCollege Park, MD 20742 [email protected]
Benjamin Black
Department of Computer ScienceUniversity of Maryland, College ParkCollege Park, MD 20742 [email protected]
Ananth Hari
Department of Computer EngineeringUniversity of Maryland, College ParkCollege Park, MD 20742 [email protected]
Abstract
In reinforcement learning, wrappers are universally used to transform the informa-tion that passes between a model and an environment. Despite their ubiquity, nolibrary exists with reasonable implementations of all popular preprocessing meth-ods. This leads to unnecessary bugs, code inefficiencies, and wasted developertime. Accordingly we introduce SuperSuit, a Python library that includes all pop-ular wrappers, and wrappers that can easily apply lambda functions to the observa-tions/actions/reward. It’s compatible with the standard Gym environment specifi-cation, as well as the PettingZoo specification for multi-agent environments. Thelibrary is available at https://github.com/PettingZoo-Team/SuperSuit ,and can be installed via pip.
Applying transformations to information passing between a model and an environment in reinforce-ment learning is an integral part of every major experimental work in the field (Mnih et al., 2013;Vinyals et al., 2019; Silver et al., 2017; Berner et al., 2019). Techniques popular on Atari environ-ments include scaling down observations with image processing methods or making the observationgreyscale to reduce processing time with neural networks, “stacking” frames together to help estab-lish velocity, or skipping frames to increase training speed (Mnih et al., 2013).These “wrappers” are very useful, but using them in practice has pain points. For code modular-ity, ease of debugging, and ease of hyper-parameter tuning, it’s generally preferable to define thewrapper function(s) outside the environment. Ideally these very commonly used functions would bedistributed in a library, so that the implementation used is as fast as possible. This is fairly importantconsidering how many times it would be called in large research projects.Gym (Brockman et al., 2016) has become the standard API and set of benchmark environments forsingle-agent reinforcement learning. PettingZoo (Terry et al., 2020a) has recently been released,achieving similar goals for multi-agent reinforcement learning environments. The only existing li-brary with wrappers for reinforcement learning are those included inside Gym, but those are primar-ily the initially popular wrappers for Atari preprocessing (Mnih et al., 2013). Newer preprocessingmethods for Atari (Machado et al., 2018), other types of environments, or multi-agent environmentsare omitted. Many Gym wrappers are also missing “quality of life” features, like outputting arrays in shape compatible with CNNs by default. Accordingly, people typically write their own wrappersthemselves. This leads to lower code quality and performance throughout the field for such key func-tions, and makes the possibility of bugs greater. Accordingly, we’ve released the SuperSuit Pythonlibrary to include all widely used wrappers for both Gym and PettingZoo environments. Each wrap-per is a function that takes an environment object and returns one, and for clarity and modularity,only includes a single function, hence our terming them “microwrappers”.
The observation wrappers we include are: • Agent Indication (Multi-Agent Only) (Gupta et al., 2017) • Color Reduction (Greyscaling, etc.) • Flatten Observation • Frame Skipping (Mnih et al., 2013) • Frame Stacking (Mnih et al., 2013) • Observation Delay • Observation Normalization • Observation Padding (Multi-Agent Only) (Terry et al., 2020b) • Recast Observation Type • Reshape Observation • Resize 2D/3D ObservationThe action wrappers we include are: • Action Clipping (Fujita and Maeda, 2018) • Action Space Padding (Multi-Agent Only) (Terry et al., 2020b) • Sticky Actions (Machado et al., 2018)The only reward wrapper we include is: • Reward Clipping (Mnih et al., 2013)Additionally, we introduce lambda wrappers that take an environment and a lambda function as anargument and the lambda function to the environment it, allowing people to easily create customwrappers. Separate lambda wrappers exist to apply functions to actions, observations, or rewards.
We introduce
SuperSuit , a Python library that includes reasonable implementations of all popularRL wrappers, for environments of both the Gym and PettingZoo API specification. This will allowresearchers to conduct more computationally efficient experiments, to try new RL wrappers muchmore easily, and to reduce the likelihood of bugs due to one-off implementations. The library isavailable at https://github.com/PettingZoo-Team/SuperSuit , and can be installed via pip.
Acknowledgments and Disclosure of Funding
Justin Terry was supported in part by the QinetiQ Fundamental Machine Learning Fellowship.
References
Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ebiak, ChristyDennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. Dota 2 with largescale deep reinforcement learning. arXiv preprint arXiv:1912.06680 , 2019.2reg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, andWojciech Zaremba. Openai gym, 2016.Yasuhiro Fujita and Shin-ichi Maeda. Clipped action policy gradient. arXiv preprintarXiv:1802.07564 , 2018.Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. Cooperative multi-agent control usingdeep reinforcement learning. In
International Conference on Autonomous Agents and MultiagentSystems , pages 66–83. Springer, 2017.Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, andMichael Bowling. Revisiting the arcade learning environment: Evaluation protocols and openproblems for general agents.
Journal of Artificial Intelligence Research , 61:523–562, 2018.Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wier-stra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprintarXiv:1312.5602 , 2013.David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez,Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of gowithout human knowledge. nature , 550(7676):354–359, 2017.Justin K Terry, Benjamin Black, Mario Jayakumar, Ananth Hari, Luis Santos, Clemens Dieffendahl,Niall Williams, Praveen Ravi, Yashas Lokesh, Caroline Horsch, and Dipam Patel. PettingZoo. https://github.com/PettingZoo-Team/PettingZoo , 2020a. GitHub repository.Justin K Terry, Nathaniel Grammel, Ananth Hari, and Luis Santos. Parameter sharing is surprisinglyuseful for multi-agent deep reinforcement learning. arXiv preprint arXiv:2005.13625 , 2020b.Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Juny-oung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmasterlevel in starcraft ii using multi-agent reinforcement learning.