[PDF] Adapted Pepper - Researchain

Abstract

One of the main issue in robotics is the lack of embedded computational power. Recently, state of the art algorithms providing a better understanding of the surroundings (Object detection, skeleton tracking, etc.) are requiring more and more computational power. The lack of embedded computational power is more significant in mass-produced robots because of the difficulties to follow the increasing computational requirements of state of the art algorithms. The integration of an additional GPU allows to overcome this lack of embedded computational power. We introduce in this paper a prototype of Pepper with an embedded GPU, but also with an additional 3D camera on the head of the robot and plugged to the late GPU. This prototype, called Adapted Pepper, was built for the European project called MuMMER (MultiModal Mall Entertainment Robot) in order to embed algorithms like OpenPose, YOLO or to process sensors information and, in all cases, avoid network dependency for deported computation.

Full PDF

AAdapted Pepper

Maxime Caniot ∗ , Vincent Bonnet † , Maxime Busy, Thierry Labaye, Michel Besombes,Sebastien Courtois and Edouard Lagrue SoftBank Robotics Europe

Abstract —One of the main issue in robotics is the lackof embedded computational power. Recently, state of the artalgorithms providing a better understanding of the surroundings(Object detection, skeleton tracking, etc.) are requiring more andmore computational power. The lack of embedded computationalpower is more signiﬁcant in mass-produced robots because of thedifﬁculties to follow the increasing computational requirementsof state of the art algorithms. The integration of an additionalGPU allows to overcome this lack of embedded computationalpower. We introduce in this paper a prototype of Pepper with anembedded GPU, but also with an additional 3D camera on thehead of the robot and plugged to the late GPU. This prototype,called Adapted Pepper, was built for the European project calledMuMMER (MultiModal Mall Entertainment Robot) in order toembed algorithms like OpenPose, YOLO or to process sensorsinformation and, in all cases, avoid network dependency fordeported computation.

Index Terms —Pepper Robot, Adapted Pepper, Jetson TX2,Intel D435, Embedded Computation

I. INTRODUCTIONDeported computation is a common way to improve anduse state of the art algorithms, such as OpenPose [1] andYOLO [2], without modifying an important part of the robothardware. However, this solution is limited by the networkinterface restriction like security access, disconnection or eventhe bandwidth. These restrictions can cause latency or impactthe accuracy of the algorithms by reducing the size of data thatit is possible to send through the network (WiFi 802.11 [3]:from 11 to 600 Mbps, GigabitEthernet [4]: 1000 Mbps).In the ﬁeld of Humanoid Robot Interaction (HRI), thenecessity to have a reactive architecture is as important asthe content of the interaction. A latency in the interactiondiminishes the satisfaction of the user [5] [6] or even inﬂuencesthe emotional state of the user [7]. However, a time of responseof one or two seconds in the interaction with a human issuggested to be more humanlike [8] [9]. Reducing the per-formances of the used algorithms or improving the hardwareto embed the computation [10] can reduce the latency withthe beneﬁce of a smoother interaction.This dilemma was encountered in the European projectcalled MuMMER [11] (MultiModal Mall EntertainmentRobot). The goal of the project was to develop a humanoidrobot able to interact autonomously and naturally in a publicshopping mall.A prototype called Adapted Pepper(Fig.1) is developed toincrease the quality of interaction thanks to an improvementof the hardware of Pepper. The Adapted Pepper is a prototype based on a Pepper 1.8 with an additional GPU and anadditional 3D camera.The integration of a GPU within Pepper has already beenstudied by the university of Chile [10] and by the universityof Salerno [12]. These integrations had the same purpose asAdapted Pepper, to enhance the perception capabilities of therobot, speciﬁcally in the ﬁelds of: • Object recognition • Face detection • Age and gender recognition • Emotion recognitionIn [10], the GPU was integrated as a Backpack and directlyconnected to Pepper through an Ethernet cable. The GPU andthe camera are further integrated into the Adapted Pepper toanswer to design constraints.This paper introduces the selected devices for the integration(the Intel D435 and the Jetson TX2 [13]), the experimentsconducted on each device before their integration, and theintegration of these devices into the robot. Finally, the soft-ware architecture is presented and validated by running DeepLearning models on the Adapted Pepper. Fig. 1. Different views of the Adapted Pepper

II. H

ARDWARE

As aforementioned, the Adapted Pepper is based on Pepperversion 1.8. The integration of the camera and the GPU onthe Pepper robot needs to compose with different constraints.The main constraints are : http://doc.aldebaran.com/2-5/family/pepper technical/index pep.html a r X i v : . [ c s . R O ] S e p Mechanical Constraints: the integration of the new de-vices must not deteriorate the mechanical functionalitiesof Pepper. • Performance Constraints: the integration of the new de-vices must not deteriorate the behavior of the robot andtheir own functions. • Temperature Constraints: the integration of the new de-vices must not increase the internal temperature of Pepper • Electrical power Constraints: the consumption of the newdevices must not severely deteriorate the autonomy of therobot. • Design Constraints: The new devices must be integratedinto the robot with a minimum impact on the currentdesign of the robot.

A. Selected devices1) Depth Camera:

The Pepper 1.8 already has an embed-ded stereo camera but the reconstruction of the 3D images isdone by the CPU of the robot. In order to avoid an overloadof the CPU, the 3D reconstruction is based on fewer framesand images of lesser resolution than the stereo camera couldachieve. The 3D reconstruction quality and frame rate areinsufﬁcient to analyze the environment at close range andin real time. The 3D Camera of Pepper is designed fora mass product market. The research ﬁelds of MuMMERneed a higher camera performance. Moreover, a camera withan embedded processor allows to save the correspondingcomputational power for the CPU. In summary, an additional3D camera, with an embedded processor, allows to obtainhigh resolution images without burdening CPU capacities withcostly 3D reconstruction while respecting the aforementionedPerformance Constraints. Due to the architecture of Pepper1.8, two conﬁgurations are possible for connecting the 3Dcamera. First conﬁguration, the camera is connected to thehead of the robot (robot connected to the GPU itself). Theimages provided by the camera are sent to the robot headand afterward to the GPU. Second conﬁguration, the camerais directly connected to the GPU. As the goal is to sendthe images on the GPU, the second conﬁguration is chosen.Directly connecting this camera to the GPU avoids any latencyand preserves the 3D images qualities. The position of thisadditional camera, on the robot forehead, is chosen to allowhuman tracking and preserve the design of the robot. The use-case with the 3D camera is human tracking (face, skeleton,gesture, etc.) requiring a high accuracy on a short range (0 to1.5m). The NAOqi framework deﬁnes 3 different engagementzones : • Zone 1, 0 to 1.5m, also called engagement zone: zonewhere the robot can interact with humans (dialogue, facetracking, etc.) • Zone 2, 1.5 to 2.5m, also called pre-engagement: zonewhere the robot can only track a human. http://doc.aldebaran.com/2-5/naoqi/peopleperception/alengagementzones.html • Zone 3, 2.5m and beyond, also called non-engagementzone: zone where the robot cannot interact with humansnor track humans.Based on these speciﬁcations, the camera selected for theintegration is the Intel D435 camera. The Intel camera is basedon active infrared (IR) stereo. It is composed of an IR projectorcombined with two imagers. The projector projects non-visiblestatic IR patterns that are captured by the left and right imagersand afterwards sent to a dedicated depth imaging processor.Moreover, a RGB camera is also embedded within this device.The realsense camera D435 speciﬁcations are : • Range(m): 0.2-10 • Depth resolution(pixels): 1280 x 720 • Interface: USB-C 3.1 Gen 1This camera is chosen for its accuracy in short range [14].The D435 has indeed a low bias (bias of 0.05m for a distanceinferior to 1.5m) and a good precision (precision of 0.05mfor a distance inferior to 6m). Furthermore, this device isnot affected by the noise of additional sensors. Finally, itssize allows an easier integration and adaptation with our robotdesign. In parallel, the integration needs to take into accountthe thermal sensitivity of the camera . The IR projectionsdiverge with heat and reduce the accuracy of the 3D images.The integration needs to take into account the heat alreadyproduced within the head of Pepper by the embedded proces-sor of the camera and the heat produced by the top cameraas well. Indeed, the position of integration of the added 3Dcamera is next to the original top camera of Pepper. In orderto not deteriorate the D435 camera accuracy, its integrationneeds to dissipate the extra heat.

2) GPU:

The current processor of Pepper 1.8 does not pro-vide enough computational power to support Deep Learningalgorithms and the behaviors of the robot in parallel. Addinga new processor with a GPU allows to run sizeable DeepLearning models without overloading the computational powerof the robot. The D435 camera is connected to this additionalGPU to use the 2D/3D images as inputs for deep learningalgorithms. The GPU and the D435 camera are powered by thebattery of the robot. Currently, the robot provides 795 Wh .In order to prevent a severe loss in autonomy and respectthe Electrical power Constraints, the power consumption ofthe selected GPU is limited to a maximum consumption of10% (79.5 Wh) of the battery of the robot. This constraintnarrows the list of candidates to embedded chips that are lesspower consuming than data center chips and cards [15]. Thedimension is an important factor for the integration. Indeedto respect the Design Constraints, the new device needs to beintegrated while minimizing the impact on the appearance ofthe robot. In Pepper 1.8, a position behind the tablet in thetorso (Fig.6a) can be used to integrate a small device with amaximum dimension of 140x140x40mm. The Jetson TX2 of-fers a good trade-off between performance and dimensions forthe architecture of the prototype. Moreover, Nvidia provides http://doc.aldebaran.com/2-5/family/pepper technical/battery pep.html , TensorRT ,cuDNN , etc. The interface between the Jetson and the robotis done by the Connect Tech Elroy motherboard . This cardis designed for the Jetson devices (TX1 and TX2/TX2i) andmeasures 87x50mm. The ﬁnal module (Fig.5) to integrate iscomposed of: • A power board • An additional SSD memory for storing dataset for DeepLearning algorithms • A carrier board • A Jetson TX2

B. Experiments1) Depth Camera:

The camera is attached to the foreheadof Pepper. The main issue is to select the best type of ﬁxa-tion for the camera to avoid overheating. These temperaturemeasures are acquired through thermocouple sensors installedon the dissipator of the camera, in a stabilised environment(25 ◦ C +/- 1 ◦ C), and in 3 different positions (Fig.2). (a) (b) (c)Fig. 2. Intel D435 in different conﬁgurations: (a) Intel D435 camera on atripod, (b) Intel D435 camera ﬁxed on the forehead and spaced from the shell,(c) Intel D435 camera ﬁxed on the forehead and built in the shell

To measure the maximum temperature of the camera, the 3Dand 2D images are streamed using the ofﬁcial library providedby the constructor during an hour to increase the temperatureof the camera. In order to take into account the heat of thetop camera of Pepper, the measurements in the conﬁguration2b and 2c are done with the robot switched on. The heatsink of the camera is passive, meaning that the dissipationmethod is based on air convection. The camera must not beenclosed in order for the passive heat sink to be efﬁcient. Themeasurements conﬁrm this sensitivity with a difference of 5 ◦ Cbetween the conﬁguration 2b and the conﬁguration 2c (Fig.3).The conﬁguration 2a is considered as the optimal positionfor the 3D camera. In comparison with the conﬁguration 2c,the temperature measured in the conﬁguration 2b is far closerto the conﬁguration 2a. This conﬁguration allows an optimaluse of the 3D camera and respects the Design Constraints.

2) GPU:

First, the heat produced by the Jetson TX2 ischaracterized with the current drawn at full capacity. ToRealize that characterization, the CPU and GPU are used to https://developer.nvidia.com/cuda-zone https://developer.nvidia.com/tensorrt https://developer.nvidia.com/cudnn http://connecttech.com/product/asg002-elroy-carrier-for-nvidia-jetson-tx2-tx1 https://github.com/IntelRealSense/librealsense Fig. 3. Ambient temperature and heat sink temperature of the camerafor three different positions: Intel D435 camera on a tripod (Conﬁguration2a), Intel D435 camera ﬁxed on the forehead and spaced from the shell(Conﬁguration 2b) and Intel D435 camera ﬁxed on the forehead and builtin the shell (Conﬁguration 2c). the maximum of their capacity during an hour. The CPU isstressed with a stress command and the GPU is stressed byan overload of matrix scalar multiplication with CUDA.Three different levels of temperature are measured: • Ambient air Temperature: temperature of the ambient air. • Jetson Outer Temperature: temperature on the heat sinkof the Jetson. • Jetson Inner Temperature: temperature of the GPU of theJetson estimated by the software of the Jetson.The temperatures are captured in two different setups: usinga passive dissipator and an active dissipator (fan) or a passivedissipator only to cool the Jetson. With a passive dissipatoronly, the Jetson Inner Temperature rises to 100 ◦ C in onlyheight minutes (Fig.4a). The Jetson safety is engaged and thesystem shuts down.As aforementioned, the inner estimated temperature is com-puted and not measured . The consequence of this estimationis seen in the Fig.4a. These outliers are a byproduct of theGPU stressing script and correspond to brief inactivity periodsduring the matrix scalar multiplication loop. Indeed, the matrixscalar multiplication script is a loop that overloads the GPUwith matrix computation tasks. When the scalar multiplicationis over, the program sends another matrix computation to theGPU. The graph exposes the maximum temperature thresholdallowed by the processor (100 ◦ C). The second setup providesa better cooling with the active heat sink: the temperaturerises to 80 ◦ C (Fig.4b). Besides, the same gap between theinner estimated and outer temperatures are measured. Thisgap can be seen in the Fig.4a and the Fig.4b (a difference of20 ◦ C between the outer and inner temperature). In a similartemperature condition, a correlation between the inner andouter temperature can be done in other experiments. https://manpages.ubuntu.com/manpages/artful/man1/stress-ng.1.html http://developer.nvidia.com/embedded/dlc/jetson-tx2-thermal-design-guide a)(b)Fig. 4. Temperature measurement (Inner/Outer temperature of the Jetson andambient air temperature) during the stress of the Jetson CPU/GPU with: (a)passive heat sink, (b) active heat sink. The active heat sink is able to stabilize the temperature andallows a security margin, preventing the Jetson from reachingthe maximum temperature authorized by the system.The current drawn by the complete module (3D Cameraand the Jetson Module Fig.5) is also measured while runninga stress test. The complete module is powered by a DC current.The estimated maximum power drawn by the complete moduledoes not exceed 30 Wh. The power consumption of thecomplete module respects the aforementioned constraint of10% of the total power provided by the robot (10% of 795Wh). Moreover, the Jetson module is composed of a DC/DCconverter allowing to adapt the current. The converter has alsothe capacity to handle twice of the current power to reducethe temperature produced by this component inside the robot.Finally, a delay block is connected to synchronize the start upof the robot with the one of the complete module. This deviceavoids safety issues with the robot own battery. Indeed, thebattery shuts down dues to a high current draw. The Fig.5presents an exploded view of the 5 parts that compose theJetson module with two designed by our team (power boardand cradle).This module is integrated behind the tablet of the robot.The cooling ﬂow of the Jetson needs to be isolated from theinner part of the pepper to preserve a good thermal exchange.The isolation of the cooling ﬂow of the Jetson module from

Fig. 5. Exploded view of the Jetson TX2 module the inside of Pepper is crucial to preserve a good thermalexchange (Fig.6). (a) (b)Fig. 6. Integration of the module on Pepper: (a) Pepper with the moduleintegrated, (b) Module with the cooling part.

To validate the integration, temperatures are measured indifferent locations, and in two different setups. First, thetemperature of the Pepper 1.8 without the module integratedis measured (Fig.7a). With the robot turned on, the TabletTemperature (temperature behind the tablet), the Chest Tem-perature (Temperature inside the chest of the robot), andAmbient Air Temperature are collected. These values areindicators of a standard temperature in a normal use of therobot. The temperature of the chest and behind the tabletdo not exceed 40 ◦ C with an ambient temperature of 25 ◦ C.In the second setup, the new module is integrated and thetemperatures measured while stressing the module with thematrix scalar multiplication script. The temperatures describedin the ﬁrst setup (behind the tablet/in the chest/ambient air)are once again acquired. An extra temperature is acquired:the temperature on the Jetson Heat Sink Temperature (JetsonOuter Temperature).As in the Fig.7a, the graph of the temperature in the chest(Fig.7b) rises to 40 ◦ C. The internal temperature of the robotis not impacted by the integration of the Jetson module.The temperature behind the tablet rises to 46 ◦ C because ofthe heat produced by the GPU system. As the temperatureconditions are similar to the ones in Fig. 4, the assumption is4 a)(b)Fig. 7. Measure of the temperature of the Tablet, the Chest, the ambient airand the Outer temperature of the Jetson in two different setups: (a) Pepperwithout the module integrated, (b) Pepper with the module integrated. to estimate the inner temperature with a difference of 20 ◦ Cwith the outer temperature. Hence, the internal temperatureof the Jetson integrated into Pepper should not exceed 80 ◦ Cwith an overload of the GPU/CPU. Moreover, the thermaloscillation seen on the graph is due to the activation anddeactivation of the active heat sink by the Jetson when theheat of the processor attains 82 ◦ C. The active heat sink isstopped when the temperature drops below 73 ◦ C. Please notethat the measures are done in extreme conditions, measuresdone in nominal conditions would yield lower temperatures.

3) Connection between the robot and the devices:

In orderto connect the different devices to the robot, an ethernet cable(connection between the head of Pepper and the Jetson) anda USB 3.0 (between the 3D camera and Jetson) are used.Because of the ability of the head of the robot to be easilyremovable, passing the cable through the neck requires toomuch time of work. To limit the integration time and respectthe integration constraints the choice was made to pass thesecables outside of the robot and through the chest instead ofthe neck. The different modules are connected with an externalwhite sheath and long enough not to disturb the movement ofthe head of the robot (Fig.8).To minimize the impact on the design of the robot, the

Fig. 8. Connection between the robot and the devices through the chest withan external white sheath. cables pass through the head and the torso instead of a directconnection to the Jeston module.III. S

OFTWARE

Firstly, this section introduces the software architecture ofthe Adapted Pepper. Secondly, the whole prototype is validatedby running Deep Learning models.

A. Architecture

The architecture (Fig.9) is composed of three differentelements communicating with each other: the robot, the JetsonTX2 (with Jetpack version 3.3 installed ) and the Intel D435camera. The architecture is based on ROS [16] to interfacethe different libraries of the three blocks. The Jetson com-municates with the camera thanks to the librealsense libraryand the ROS wrapper, realsense-ros . The communicationbetween the head of the robot and the Jetson is based ona ROS wrapper, naoqi driver , and more precisely on thelibqi library . To use the libqi or libqi-python libraries,a modiﬁcation of the libraries is needed to enable theircompilation for the ARM architecture of the Jetson. Theseslibraries allow the connection with NAOqi [17], the frameworkrunning on the robot. B. Experiments

Two different models are tested: OpenPose and YOLO. Forvalidating the architecture and the prototype, the computationof both models needs to be embedded not to overload the net-work bandwidth and remove network dependency. Moreover,the prototype needs to be faster than a ”standard” Pepper usingexternal computation.

1) OpenPose:

OpenPose provides different models: MPII,COCO, BODY 25. Only the COCO and BODY 25 modelsare compared in this section. To optimize the OpenPoseinference, the Caffe model ﬁle of a trained model is convertedinto a GIE (Nvidia GPU Inference Engine) object moreadapted to the GPU of the Jetson, by using TensorRT (TRT).This operation is possible for the COCO model but someissues are encountered with the BODY 25 model. Indeed, https://developer.nvidia.com/embedded/jetpack-3 3 https://github.com/IntelRealSense https://github.com/ros-naoqi/naoqi driver https://github.com/aldebaran/libqi https://github.com/aldebaran/libqi-python ig. 9. Architecture of the three different elements communicating with eachother: Pepper, the Jetson TX2 and the Intel D435 Camera the TRT library (version 4.0) does not support PReLU layersrequired to run this OpenPose model. A solution to supportit would be to create a TRT plugin. Experiments show thatthis method breaks the CBR (combination of Convolution,Bias and activation Relu) process in TRT and is very slow.Another workaround is to replace each PReLu operation bya combination of Relu layer, Scale layer and ElementWiseaddition Layer. However, the caffe model needs to be trainedwith this modiﬁed version of BODY 25 model. The trainingcan be burdensome and there is no conﬁdence about the ﬁnalaccuracy or speed of this new model. In a future version, anupgrade of Jetpack to Jetpack 4.3 that support PReLU, thanksto TRT 6.0, is considered. The accuracy and the framerateobtained by the three different models are compared: COCO(caffe and TRT model) and BODY 25 (caffe model). Becauseof the computational power limitation of the device, it was notpossible to run OpenPose with a high input resolution. Indeed,the input net resolution must be reduced below 256x256 to geta framerate above 5 FPS (Tab.I). The input net resolution needsto be reduced even more for the COCO model. The inferencespeed of the COCO model has been improved a lot by TRT(1.8 times faster) and is comparable to the inference speed ofthe BODY 25 model. TABLE IN

UMBER OF FRAMES PER SECONDS OF THREE O PEN P OSE MODELSRUNNING ON THE J ETSON

TX2input BODY 25 COCO COCOnet resolution (Caffe) (Caffe) (TRT)368x368 2.4 0.9 1.7256x256 5.1 1.8 5.7256x128 8.9 6.6 8.6128x128 15.7 8.5 13.6

For each model, the Average Precision (AP) and AverageRecall (AR) on the COCO [18] validation set (2017) are tested.Whereas both COCO model obtain as expected a similarscore, the accuracy of BODY 25 is better for each resolution (Fig.10). However, in order to obtain the maximum AP as in[1](61.8%), the net resolution needs to be above 368x368. (a)(b)Fig. 10. Metrics on the COCO validation set (2017): (a) Average Precision (athuman keypoints similarity=.50:.05:.95) of the different models, (b) AverageRecall (at human keypoints similarity=.50:.05:.95) on human keypoints of thedifferent models

In order to preserve an optimal accuracy and frameratewith the use of another neural network, the net resolutionof 256x256 with the BODY 25 model can be an appropriatecompromise.

2) OpenPose and YOLO:

The usage of YOLO and Open-Pose in parallel is tested to simulate a use case of theaforementioned project MuMMER, head pose estimation fromOpenPose and object detection. The inference speed of tinyYOLO v3 is already fast enough on the Jetson (25.0 FPSwith a 2D image resolution of 640x480 pixels and an inputnet resolution of 416x416). The YOLO model is adapted tointerface the model with the input of the Intel D435 camera .Three ROS nodes are launched in parallel: • realsense (resolution color camera at 648x480 pixels) • OpenPose (input net resolution of 256x256 andBODY 25 model) • tiny YOLO v3 (input net resolution of 416x416)As a consequence of running two models in parallel, theirrespective framerate (Tab.II) is divided by two. The framerate https://github.com/softbankrobotics-research/darknet ros

6f OpenPose dropped from 5.1 to 2.6 FPS. The bottleneckof the architecture is limited by the embedded computationalpower and not by the bandwidth of the network interface.

TABLE IIF

RAMERATE OF TWO MODELS RUNNING IN PARALLEL ON THE J ETSON

TX2Model Framerate (Image per seconds)OpenPose 2.6Tiny YOLO V3 11.6

3) Embedded vs non-Embedded:

The performances of theAdapted Pepper (embedded computation) and the perfor-mances of a ”standard” Pepper 1.8 (non embedded compu-tation) are compared. The images from the top camera ofPepper 1.8 are sent through WiFi to an external computer.The external computer embeds a 1080 Tegra Ti GPU. Theimages retrieved by the external computer with naoqi driver(resolution of 640x480 pixels) are used as input for the YOLOmodel (tiny YOLO v3, input net resolution of 416x416). Thissetup reaches a framerate of 10.3 images per second. Theembedded computation obtains a much better framerate, 25.0FPS. The non embedded computation performance is due tothe WiFi bandwidth. Indeed, the images are received throughthe WiFi at a frequency of 11.4 FPS. This bottleneck is notobserved with the Jetson TX2 directly connected to the IntelD435 camera. The bottleneck due to the network interface ishighlighted with the stereo image of Pepper 1.8. Whereas theframerate obtained for each 3D resolution with the AdaptedPepper and the D435 camera are 30 FPS, the framerate ob-tained with the stereo camera of Pepper 1.8 collected throughWiFi reaches 15 FPS at a requested resolution of 320x180pixels (Tab.III).

TABLE IIIF

RAMERATE OF THE STEREO IMAGE OF P EPPER

THROUGH W I F I camera resolution (pixels) Framerate (image per seconds)1280x720 1.6640x360 4.8320x180 14.8 IV. CONCLUSIONThis paper presents a prototype called Adapted Pepper em-bedding a GPU (Jetson TX2) and an additional camera (intelrealsense D435). These hardware modiﬁcations increase thecomputational power and the sensing capabilities of the robot,allowing to run Deep Learning algorithms in an embeddedfashion, hence improving the capabilities of Pepper. By remov-ing the network factors, the robot is not network dependentanymore, more secured, and the bottleneck of the solutionis the embedded GPU capabilities, which is a controlledfactor. However, the total computational power available onthe robot is limited compared to graphic cards for computers (like the one used in experiments). Efforts of integration andoptimization were needed to optimize the inference speed.To minimize the bandwidth usage, preprocessing the sensorsdata with embedded computation reduces the amount of datato be sent to an external processing unit. The prototype canbe improved by using the latest version of Jetpack offeringan update on essential libraries (CUDA, TensorRT, cuDNN,etc.). Replacing the Jetson TX2 with a device with morecomputational power could also be envisioned to improve theoverall performances. R

EFERENCES[1] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh.Openpose: Realtime multi-person 2d pose estimation using part afﬁnityﬁelds, 2018.[2] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. Youonly look once: Uniﬁed, real-time object detection, 2015.[3] Sneha V Sangolli and T Jayavignesh. Tcp throughput measurement andcomparison of ieee 802.11 legacy, ieee 802.11 n and ieee 802.11 acstandards.

Indian Journal of Science and Technology , 8:20, 2015.[4] Jens Mache. An assessment of gigabit ethernet as cluster interconnect.In

ICWC 99. IEEE Computer Society International Workshop on ClusterComputing , pages 36–42. IEEE, 1999.[5] Werner Kuhmann, Wolfram Boucsein, Florian Schaefer, and JohannaAlexander. Experimental investigation of psychophysiological stress-reactions induced by different system response times in human-computerinteraction.

Ergonomics , 30(6):933–943, 1987.[6] Ben Shneiderman, Catherine Plaisant, Maxine Cohen, Steven Jacobs,Niklas Elmqvist, and Nicholas Diakopoulos.

Designing the user in-terface: strategies for effective human-computer interaction . Pearson,2016.[7] Euijung Yang and Michael C Dorneich. The effect of time delay onemotion, arousal, and satisfaction in human-robot interaction. In

Pro-ceedings of the human factors and ergonomics society annual meeting ,volume 59, pages 443–447. SAGE Publications Sage CA: Los Angeles,CA, 2015.[8] Masahiro Shiomi, Takashi Minato, and Hiroshi Ishiguro. Subtle reactionand response time effects in human-robot touch interaction. In

Interna-tional conference on social robotics , pages 242–251. Springer, 2017.[9] Toshiyuki Shiwa, Takayuki Kanda, Michita Imai, Hiroshi Ishiguro, andNorihiro Hagita. How quickly should a communication robot respond?delaying strategies and habituation effects.

International Journal ofSocial Robotics , 1(2):141–155, 2009.[10] Esteban Reyes, Cristopher G´omez, Esteban Norambuena, and JavierRuiz-del Solar. Near real-time object recognition for pepper basedon deep neural networks running on a backpack. arXiv preprintarXiv:1811.08352 , 2018.[11] Mary Ellen Foster, Rachid Alami, Olli Gestranius, Oliver Lemon,Marketta Niemel¨a, Jean-Marc Odobez, and Amit Kumar Pandey. Themummer project: Engaging human-robot interaction in real-world publicspaces. In

International Conference on Social Robotics , pages 753–763.Springer, 2016.[12] Alessia Saggese, Mario Vento, and Vincenzo Vigilante. Miviabot: Acognitive robot for smart museum. In

International Conference onComputer Analysis of Images and Patterns , pages 15–25. Springer, 2019.[13] NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge | NVIDIA Developer Blog, Mar 2017. [Online; accessed 23. Jan. 2020].[14] G. Halmetschlager-Funek, M. Suchi, M. Kampel, and M. Vincze. Anempirical evaluation of ten depth cameras: Bias, precision, lateral noise,different lighting conditions and materials, and multiple sensor setups inindoor environments.

IEEE Robotics Automation Magazine , 26(1):67–77, March 2019.[15] Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Sid-dharth Samsi, and Jeremy Kepner. Survey and benchmarking of machinelearning accelerators. arXiv preprint arXiv:1908.11348 , 2019.[16] Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote,Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. Ros: an open-sourcerobot operating system. In

ICRA workshop on open source software ,volume 3, page 5. Kobe, Japan, 2009.

17] Aldebaran Robotics. Naoqi framework. [Online], Available: http://doc.aldebaran.com/2-5/index dev guide.html.[18] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, RossGirshick, James Hays, Pietro Perona, Deva Ramanan, C. LawrenceZitnick, and Piotr Dollr. Microsoft coco: Common objects in context,2014. A PPENDIX

A. Camera Integration (a) (b)Fig. 11. Camera integration: (a) Exploded view of the camera integration,(b) Integrated result of the camera integration

B. Jetson TX2 module Integration