ORRB -- OpenAI Remote Rendering Backend
OORRB - OpenAI Remote Rendering Backend
Maciek Chociej Peter Welinder Lilian Weng
OpenAIFigure 1: A batch of visually randomized samples (RGB, depth, normal, and segmentation channels),rendered with the OpenAI Remote Rendering Backend (ORRB), depicting the Shadow DexterousHand manipulating a colourful block.
Abstract
We present the OpenAI Remote Rendering Backend (ORRB), a system that allowsfast and customizable rendering of robotics environments. It is based on theUnity3d [16] game engine and interfaces with the MuJoCo [14] physics simulationlibrary. ORRB was designed with visual domain randomization in mind. It isoptimized for cloud deployment and high throughput operation. We are releasing itto the public under a liberal MIT license: https://github.com/openai/orrb . Simulation-to-reality (Sim2Real) transfer is a central problem in applying reinforcement learning(RL) to real world robotics tasks. Many complex problems that can be solved in simulation failto transfer to physical robots due to unmodeled effects or uncalibrated parameters. Systems thatdepend on computer vision also suffer from this fate. It often hard to transfer from synthetic data toa controlled lab setup, not to mention handling the endless variety of lighting conditions, materialappearance and optical phenomena in a robust way. a r X i v : . [ c s . G R ] J un omain randomization [13] [11] is a promising approach to closing the reality gap. Unlike strivingfor photo-realism or using real data in the training pipeline, it is simple, fast and can be used toproduce data in abundant amounts. However, there are few good existing tools to perform visualdomain randomization.We have created ORRB with the following goals in mind: • Domain Randomization as First-Class Citizen: expose appearance control parametersexternally and allow randomization on different level of granularity, • Modular, Customizable & Expandable Architecture: provide a framework that enablesrapid prototyping and development of new randomization and augmentation techniques, • Performance: optimize code for large scale, high throughput, batch rendering in a dis-tributed, headless environment, • Ease of Use: allow seamless, easy to configure integration with Machine Learning (ML)libraries.We have used it to successfully train Dactyl [10] – the dexterous robotic hand – purely from simulated,synthetic data.The paper is structured as follows. Section 2 gives an overview of previous work in the field,especially the use of game engines to generate synthetic training data. Section 3 elaborates on thecharacteristics of the typical robotics RL setup we work with, and how that influences the renderingpipeline. Section 4 focuses on system design and technical details of implementation. Finally,Section 5 contains a performance evaluation across different setups.
Computer games are a natural choice for machine learning research. They present rich, interactiveenvironments, with simple, general mechanics that often give rise to complex emergent gameplay.The skills needed to master different games range from impulse reflexes required to solve an arcadegame, to long term planning and strategic reasoning that is necessary to compete on a professionallevel in complex real-time strategy titles. This is why recent results like: Atari [8], Dota 2 [9] orStarcraft II [17] were met with great interest, and mark the steady progress of ML/RL techniques.A second track of research involves using game engines and middleware to craft environments thatcan be used to solve non-game related problems. The most prominent application is autonomousnavigation. Both Unity [16] and Unreal [2] game engines are used in autonomous drone [12] and car[15] simulations.Finally custom designed environments built on top of games and game engines like: Doom [7], QuakeIII Arena [4], Minecraft [1] or Unity [6, 5] offer a testbed for basic ML research.
Modern game engines are some of the most complicated pieces of software. Not only they are on thecutting edge of real-time computer graphics and physical simulation, but also must provide flexibilityto empower the creative process behind the coolest games. They need to execute on a wide varietyof different hardware and operating system platforms, and support enjoyable entertainment on aspectrum of devices with computing capabilities ranging from mobile phones to multi-GPU desktopcomputers.Game engines are however designed with some characteristics that make them impractical in largescale machine learning. They are optimized to generate high resolution images with resolution andfrequency aimed at human perceptive capabilities. A typical game produces to images persecond at a resolution in the order of × pixels. A game engine is outputting the images ina serial way, one image after another, interactively controlled by the human operator, keeping thetemporal coherence in order to produce the illusion of smooth motion. Low latency and consistentrendering rate are most important to provide pleasant experience. The rendering system is most oftenGPU constrained in graphic heavy titles. 2n large scale ML scenarios we rarely require low latency. During training we can often exploit thelarge data parallelism of training samples and trajectories. Because of this embarrassing parallelismwe can benefit from batch rendering and reduce some of the per frame and communication overheads.Total rendering throughput proves to be the more important metric. For computer vision purposes werequire low resolution images, common sizes include: × , × , × . This in turnmoves the compute bottleneck from GPU rasterization and shading, and towards the work necessaryto set up the rendering on the CPU.A typical game environment involves multiple dynamic objects interacting in a highly customizedways. The description of the state can be arbitrarily complex. In a robotic simulation we usuallyoperate in the framework of Markov Decision Processes (MDP) on a number of rigid and soft bodiesconnected by tendons and joints. This allows for compact state description. Furthermore we exploitthe inherent parallelism of our setting and perform computation in a Single Scene Multiple States(SSMS) way. The bulk of the data, that is geometry, textures, materials, and the kinematic hierarchydescription can be shared between instances and only the randomized parameters and the lightweightstate need to be kept separate.Most of the system design and optimization effort was put into working around the limitations of theinteractive mode of operation in the Unity game engine and optimizing it for high throughput, batch,SSMS rendering. ORRB renderer is a standalone binary built with the Unity game engine. On start-up the binaryloads a scene XML and all the necessary assets, i.e: textures and meshes. Then, the applicationreads problem specific configuration files. Those contain the description of how the scene shouldbe transformed and randomized for rendering. They also describe how the state description mapsto the kinematic hierarchy of the reconstructed scene. Finally ORRB starts a GRPC[3] service on apredefined port and awaits for incoming update and batch render requests.
For scene description we use the MuJoCo XML format. We support a large subset of the specification,including: bodies; geoms both of mesh and primitive types; hinge, slide and ball joints; cameras; andsites. We load the material definitions and use the various parameters like: RGB color, transparencyspecular and emissive strength to set up Unity materials’ appearances. We also support the propertyclass inheritance and XML includes. The mesh assets can be loaded from both binary and text STLfiles.In addition to the XML scene description we use configuration files that contain the description ofthe transformations and randomizations to be applied. The configuration files are stored as humanreadable YAML protocol buffers. We use the same protocol buffers, albeit in binary wire format,to update the renderer configuration. Configuration updates are executed on the fly, over RPC.The ability to change randomization parameters during training opens new possibilities and makesadaptive randomization and similar techniques possible.
The component manager is the central part of ORRB. Its main task is instantiating, running andupdating the renderer components. A renderer component is an encapsulated module responsiblefor single scene augmentation or randomization. We use the same framework for some interactivetools. A renderer component can be attached to any entity in the scene’s kinematic hierarchy withthe goal of modifying the properties of that singular entity or the entities in its kinematic sub-tree.Sub-entities can be filtered by type and name. This framework also allows the renderer components toemit auxiliary outputs, which can be used both in training auxiliary tasks and in rendering adaptationtechniques.To the extent possible, randomization and rendering is deterministic. A seed can be provided withevery batch, or every state. The seed is used to reinitialize the random number generator. We put3 ender serverinstanceRenderBatch()UpdateConfig() joint mapping sceneXMLrendererconfig robot loaderscene instance c o m pon e n t m a n a g e r G R P C se r v i ce r ec o r d e r O RRB c li e n t n e t w o r k state batch + seedsnew renderer configimage batch +auxiliary outputs render serverinstancerender serverinstance renderer componentrenderer componentrenderer componentrenderer componentrenderer componentrenderer componentasync render resultsasync render resultsasync render results geometrylightsmaterialsjointscameras - transform- randomize- augment asy n c r e nd e r qu e u e update component configsfor each state:- update state- set seed- run components- gather auxiliary outputs qu e u e w o r ke r renderimage batch Figure 2: System overview.diligent care into making sure that the randomizers are executed in the same order, and that the flowof the randomized code is exactly the same – leading to reproducible, identical randomization results.That said, in our current model, with OpenGL rendering and a number of GPU based renderingtechniques, some low level pixel discrepancy of the final images is unavoidable.We release ORRB with a number of ready-made renderer components. Those can be roughly dividedin two categories: randomizers and scene setup utility components. • MaterialRandomizer – randomizes the tint, metallic and glossiness levels in materials.Additionally can assign random pattern and normal textures with randomized tiling andoffset parameters, • FixedHueMaterialRandomizer – randomizes the material appearance near some prede-fined, calibrated values. This randomizer operates in HSV color space and allows specifyingseparate randomization radiuses and clamping ranges for the three components. Additionallymetallic level, glossiness, e † missive probability and emissive power can be randomizedwithin configurable ranges, • CameraRandomizer – randomizes the position, orientation and angle of view of therecording cameras. This randomizer operates in two modes:
Jitter – where a small randomperturbations are applied, and
Orbit – where the camera is randomly placed orbiting a pointof interest specified in the scene. The two modes can be both applied at the same time, • JointRandomizer – randomizes joint positions within their limit ranges, • LightRandomizer – randomizes the intensity of scene lights. Allows specifying the rangefrom which the total scene illumination will be drawn. Exposes configurable ranges forthe random relative weights of each light’s individual intensity contributions, and spotlightangles. • PostprocessingRandomizer – randomizes the post processing effect parameters. We usethe Unity PostProcessingV2 stack, and allow customized randomization ranges for:
Ambient cclusion intensity; Color Grading hue, saturation, contrast, brightness, temperature, andtint;
Bloom intensity, and diffusion radius; and finally
Grain intensity, size, and coloration. • Tracker – generates auxiliary outputs with screen space coordinates and bounding boxesfor the tracked objects, • TranslateRotateScale – modifies the local position, rotation, and scale of an entity in thekinematic hierarchy, • Hide – enables and disables the rendering of a given entity, • LookAt – orients the entity, pointing towards a specified target, • LightSetup – creates a number of scene lights in a programmatic way. Allows specifyingthe ranges for: distance from, and height above a specified target.
The Camera Calibrator renderer component can be used to align the scene cameras with the footagefrom the physical ones. It is used in the interactive mode, and provides a rudimentary GUI to fine-tunethe camera’s position, rotation and angle of view.
In order to achieve high throughput we removed unnecessary data stalls and GPU / CPU synchroniza-tion. The capture pipeline maintains a pool of render textures that are used in a round robin order.Similarly the destination textures for the whole batch are pre-allocated, and transferred back to theCPU memory when the whole batch is ready, with bulk DMA. Because consequent frames have nodata dependencies, rendering finalization on the GPU can happen in an asynchronous, pipelined way.We support 4 rendering modes: the main RGB image, optionally with additional transparency channel;segmentation map; depth map; and the screen space normal vector map.
ORRB is to large extent client language agnostic. Any language that supports GRPC can commu-nicate with the render service. We provide simple client side utilities for Python: server life-cyclemanagement, and parallel batch render executor. Additionally we release a small Python suite ofdemos, benchmarks and unit tests.
We have successfully ran ORRB both on local machines operating under OSX and Ubuntu, andin server environments i.e. on Kubernetes, Microsoft Azure and Google Cloud Platform virtualmachines. True headless rendering is currently not supported in Unity. In order to make datacenterrendering possible we start a Xorg instance on top of Nvidia virtual devices. Then, we set up multiplevirtual screens (in xorg.conf ) to efficiently control multi-GPU rendering.
In order to measure the performance of our system, we have used the environment from [10]. Thescene depicts a Shadow Dexterous Hand holding a colorful block. It contains three cameras andthree spot lights with soft shadow casting enabled. The renderer configuration consists of: calibratedmaterial randomizers for the block, a material randomizer for the hand and the background, acamera jitter randomizer, a light position randomizer, a light intensity randomizer, and finally thepostprocessing effects randomizer. We use a batch size of samples, which amounts to imagesrendered in one call. The images are rendered in × pixels resolution in × bit RGB colordepth.Measurements have been performed on Google Cloud Platform. The benchmark machine was basedon a n1-standard-96 VM with a virtual core processor and NVidia V100 GPUs. We used5buntu 16.04, NVidia 410.48 drivers; and Unity 2018.3 was used to build the render server binary.Benchmark results were produced with the benchmark.py script shipped with ORRB.The analysis of the performance benchmarks shows a number of different interesting effects. Firstof all, it takes between and render servers running in parallel to fully saturate a V100 GPU.As the number of render servers increases the single threaded python model becomes the bottleneckand the client code begins to throttle performance. In order to counter that, we run multiple, separateclient processes — one per approximately to render server instances. With this approach wewere able to scale up to 3 GPUs. Further scaling is limited by insufficient CPU processing power.Table 1: Performance of ORRB on − V100 GPUs with different numbers of render servers andMPI clients working in parallel.
Number of render servers F r a m e s p e r s e c o n d Table 2: GPU scaling - maximal throughput achieved with different number of V100 GPUs.
GPU count 1 2 3 4FPS
Table 3: CPU scaling - maximal throughput achieved with different number of render servers (using8 V100 GPUs).
Render servers 8 16 24 32 40 48 56 64 72 80 88FPS
736 1215 1457 1876 2290 2628 2847 3065 3284 3370 3438
Author contributions
Maciek Chociej led the development of ORRB, designed and implemented the core systems, therendering components and the client library.Peter Welinder designed and implemented auxiliary channel rendering and was involved in shapingthe renderer API.Lilian Weng designed and implemented screen space tracking and auxiliary outputs.6 cknowledgements
We used these open source libraries to build ORRB: GRPC, protobuf, MIConvexHull, pb_Stl,UnityFBXExporter & PostProcessingV2.We would like to thank the following people at Unity Technologies: Danny Lange, Marwan Mattar &Vilmantas Balasevicius for their help during development and debugging.We would also like to thank the following people for providing feedback on earlier versions of thismanuscript: Josh Tobin & Wojciech Zaremba.
References [1] D. Abel, A. Agarwal, F. Diaz, A. Krishnamurthy, and R. E. Schapire. Exploratory gradient boosting forreinforcement learning in complex domains.
CoRR , abs/1603.04119, 2016.[2] Epic Games. Unreal game engine. Available at: .[3] Google. Grpc. Available at: https://grpc.io/ .[4] M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C.Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo, D. Silver,D. Hassabis, K. Kavukcuoglu, and T. Graepel. Human-level performance in 3d multiplayer games withpopulation-based reinforcement learning.
Science , 364(6443):859–865, 2019.[5] A. Juliani, V.-P. Berges, E. Vckay, Y. Gao, H. Henry, M. Mattar, and D. Lange. Unity: A general platformfor intelligent agents.
CoRR , abs/1809.02627, 2018.[6] A. Juliani, A. Khalifa, V. Berges, J. Harper, H. Henry, A. Crespi, J. Togelius, and D. Lange. Obstacletower: A generalization challenge in vision, control, and planning.
CoRR , abs/1902.01378, 2019.[7] M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaskowski. Vizdoom: A doom-based AI researchplatform for visual reinforcement learning.
CoRR , abs/1605.02097, 2016.[8] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Playingatari with deep reinforcement learning.
CoRR , abs/1312.5602, 2013.[9] OpenAI. Openai Five, 2019. Available at: https://openai.com/five/ .[10] OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. W. Pachocki, J. Pachocki,A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, andW. Zaremba. Learning dexterous in-hand manipulation.
CoRR , abs/1808.00177, 2018.[11] F. Sadeghi and S. Levine. CAD2RL: real single-image flight without a single real image. In
Robotics:Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, July12-16, 2017 , 2017.[12] S. Shah, D. Dey, C. Lovett, and A. Kapoor. Airsim: High-fidelity visual and physical simulation forautonomous vehicles.
CoRR , abs/1705.05065, 2017.[13] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferringdeep neural networks from simulation to the real world. arXiv preprint arXiv:1703.06907 , 2017.[14] E. Todorov, T. Erez, and Y. Tassa. Mujoco: A physics engine for model-based control. In
IntelligentRobots and Systems (IROS), 2012 IEEE/RSJ International Conference on , pages 5026–5033. IEEE, 2012.[15] Unity Technologies. Simviz, 2018. Available at: https://blogs.unity3d.com/tag/simviz/ .[16] Unity Technologies. Unity 3d game engine, 2019. Available at: https://unity.com .[17] O. Vinyals, I. Babuschkin, J. Chung, M. Mathieu, M. Jaderberg, W. M. Czarnecki, A. Dudzik,A. Huang, P. Georgiev, R. Powell, T. Ewalds, D. Horgan, M. Kroiss, I. Danihelka, J. Agapiou,J. Oh, V. Dalibard, D. Choi, L. Sifre, Y. Sulsky, S. Vezhnevets, J. Molloy, T. Cai, D. Budden,T. Paine, C. Gulcehre, Z. Wang, T. Pfaff, T. Pohlen, Y. Wu, D. Yogatama, J. Cohen, K. McKin-ney, O. Smith, T. Schaul, T. Lillicrap, C. Apps, K. Kavukcuoglu, D. Hassabis, and D. Silver. Al-phaStar: Mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/ , 2019., 2019.