Autonomous Social Distancing in Urban Environments using a Quadruped Robot
Tingxiang Fan, Zhiming Chen, Xuan Zhao, Jing Liang, Cong Shen, Dinesh Manocha, Jia Pan, Wei Zhang
11 Autonomous Social Distancing in UrbanEnvironments using a Quadruped Robot
Tingxiang Fan ∗ , Zhiming Chen ∗ , Xuan Zhao , Jing Liang , Cong Shen , Dinesh Manocha , Jia Pan † and WeiZhang † Abstract —COVID-19 pandemic has become a global challengefaced by people all over the world. Social distancing has beenproved to be an effective practice to reduce the spread of COVID-19. Against this backdrop, we propose that the surveillance robotscan not only monitor but also promote social distancing. Robotscan be flexibly deployed and they can take precautionary actionsto remind people of practicing social distancing. In this paper,we introduce a fully autonomous surveillance robot based ona quadruped platform that can promote social distancing incomplex urban environments. Specifically, to achieve autonomy,we mount multiple cameras and a 3D LiDAR on the leggedrobot. The robot then uses an onboard real-time social distancingdetection system to track nearby pedestrian groups. Next, therobot uses a crowd-aware navigation algorithm to move freelyin highly dynamic scenarios. The robot finally uses a crowd-aware routing algorithm to effectively promote social distancingby using human-friendly verbal cues to send suggestions to over-crowded pedestrians. We demonstrate and validate that our robotcan be operated autonomously by conducting several experimentsin various urban scenarios.
Index Terms —Surveillance Robot; Social Distancing; Human-Robot Interaction;
I. I
NTRODUCTION C OVID-19 pandemic has quickly become the most dra-matic and disruptive event experienced by people allover the world. People may need to live with the virus fora long time. One of the most effective measures to minimizethe spread of the coronavirus is to promote social distancing.To achieve it, some related applications are developed inthe existing on-site closed-circuit television (CCTV) systemsto detect social distancing. However, the on-site monitoringsystem is not ubiquitous in some areas and sometimes it maynot be able to cover all public corners. Furthermore, althoughthis sort of monitoring system has detected social distancingviolations, it fails to take any proactive actions to promotesocial distancing. * denotes equal contribution. † denotes the corresponding author. Tingxiang Fan, Jia Pan (email: [email protected] ) are with theDepartment of Computer Science, The University of Hong Kong, Hong Kong,China Zhiming Chen, Wei Zhang (email: [email protected] ) are with the Department of Mechanical& Energy Engineering, Southern University of Science and Technology,Shenzhen, Guangdong, China Xuan Zhao is with the Department of Biomedical Engineering, CityUniversity of Hong Kong, Hong Kong, China Dinesh Manocha, Jing Liang are with Department of Computer Sci-ence, University of Maryland, College Park, USA Cong Shen is with School of Mechanical Science & Engineering,Huazhong University of Science and Technology, Wuhan, Hubei, China
Compared to the on-site monitoring system, the surveillancerobots can be flexibly deployed and patrol in the desired publicareas. Moreover, the robot can take precautions to promotesocial distancing rather than monitoring them. These potentialapplications have been validated by teleoperation robots [1]and hybrid systems [2]. The hybrid system introduces theexternal devices such as CCTV to help robots monitoring so-cial distancing. However, developing such a fully autonomoussurveillance robot in complex urban environments without anyexternal device still encounter several challenges. First, tomonitor the social distance between pedestrians without anyexternal device, a robot-centric real-time perception system isdemanded on the on-board devices with limited computation.Second, in many urban scenarios, the robot needs to safelynavigate through unstructured and highly dynamic environ-ments. Third, more intelligent interactions with humans needto be designed to improve the efficiency of promoting socialdistancing.In this paper, we introduce a fully autonomous surveil-lance robot to promote social distancing in complex urbanenvironments. To achieve this autonomy, we first build thesurveillance system with multiple cameras and a 3D LiDAR ona legged robot, which empowers the robot omni-perceptibilityand extends its traversability in complex urban terrains withuneven terrains and stairs, which are challenging for normalwheeled mobile robot. Then, we develop an on-board real-time social distancing detection system with the ability totrack the robot’s nearby pedestrian groups. Next, the Crowd-Move [3] algorithm is used to navigate the robot in highlydynamic environments. Finally, we develop a crowd-awarerouting algorithm to allow robots to approach over-crowdedpedestrian groups and effectively promote social distancingusing verbal cues. We also investigate the influence of humanvoices to the effectiveness and acceptability of quadrupedsurveillance and social distancing, because it has been reportedthat a robotic patrolling inspector can be terrifying for generalcitizen ∗∗ . We demonstrate that this surveillance robot can beautomatically operated with satisfactory human response byconducting experiments in various urban scenarios.The rest of this paper is organized as follows. Section IIreviews the related work. Section III describes the hardwareplatform that the surveillance robot builds on. Section IVpresents the robot’s tracking algorithm used for social dis-tancing detection. Section V illustrates the robot’s navigation ∗∗ a r X i v : . [ c s . R O ] A ug in urban scenarios. Section VI discusses the robot’s interac-tions with humans through verbal communication. Section VIIpresents the experiments conducted to validate the proposedalgorithms. Section VIII concludes this paper.II. R ELATED W ORK
In this section, we will give a brief overview of algorithmsrelated to our system, including the robotic perception, navi-gation, and interaction for surveillance robots.
A. Perception for Surveillance Systems
Pedestrian tracking has been widely applied in surveillancevideo analysis and is well developed based on research onmulti-object tracking problems [4]–[7].Discrete velocities are used to model pedestrians’ mo-tion [8], [9]. Although discretization improves the efficiencyof prediction, this approach cannot fully satisfy real-life con-tinuous situations. Chung et al. developed cognitive models toimprove the performance of their model [10], but they did notconsider the people’s facing direction by only using circularsto model pedestrians.Helbing et al. proposed social force to model and predictpeople’s move according to energy potential which is causedby people and obstacles [11]. Then the tracking performanceis improved by detecting abnormal events among pedestri-ans [12]. Pellegrini et al. developed Linear Trajectory Avoid-ance (LTA) to improve the accuracy of motion prediction [13].[14]–[16] developed social interactions among pedestriansto improve the accuracy of behavior models. Sheng et al.proposed the Robust Local Effective Matching Model to solvethe issue of partial detection of objects [17]. However, theseapproaches cannot describe pedestrians’ dynamics in densesituations because they only use linear models. In our system, anonlinear model, Frontal RVO (F-RVO) [4] is used to simulatemotions in crowds and also model the dynamic behaviorsconsidering pedestrians’ facing directions.With the blossom of deep learning, CNN is well developedto extract the trajectory of a single object [18]–[20]. Chu etal. developed STAM to detect more objects [21]. Fang et al.improved the performance of tracking by using RNN [6]. Theauthors in [7] developed the SORT model to track pedestrians.However, by tightly coupling detection and tracking, theseapproaches cannot always provide satisfactory performancein pedestrian detection. Mask R-CNN [22] and YOLO [23]are two state-of-the-art detection networks with sufficientperformance for detecting purposes, where YOLO is muchfaster than Mask R-CNN, and thus is more suitable for real-time tracking tasks.
B. Navigation in Urban Environments
Compared to the fixed video surveillance system, thesurveillance robot not only has the above perception capabil-ities but also endowed the surveillance camera with mobility.However, navigating a robot in urban environments is non-trivial.First, the robot would inevitably interact with dynamicobstacles like pedestrians, bicycles. Some studies have been
RouterLiDARCamerasSpeakerJetson Xavier Intel NUC
GPU ComputerCPU ComputerCameras 3D LiDARRouter
CSI Lanes EthernetEthernetEthernet
Robot Dog
Ethernet
Sensor-KitMobile PlatformComputer Platform
Fig. 1: Overview of our hardware and software system. proposed to deal with the collision avoidance problems in suchdynamic scenarios. [24], [25] proposed that each agent in dy-namics scenarios should take half of the responsibility of col-lision avoidance. Based on that, they develop the multi-agentcollision avoidance algorithm with zero-communication. [26],[27] presented the interacting Gaussian processes to capturethe cooperative collision avoidance behavior, and introducedthe cooperative planner for robot navigation. However, thesealgorithms fail to track a moving pedestrian without the assis-tance of external devices. [28], [29] deployed a LiDAR withmultiple cameras on robot to track surrounding pedestrians. Tonavigate the robot in the crowds, they utilized the reinforce-ment learning algorithm to train the socially aware collisionavoidance policy. Different from the above algorithms, [30]–[32] proposed a sensor-level collision avoidance policy learnedvia reinforcement learning, which can directly process the rawLiDAR data to generate collision-free actions.
C. Robot Interaction
Human-like characteristics of social robots would influenceusers’ response. Among various social traits, gender is impor-tant for interpersonal relationships and evokes social stereo-types [33]. Previous research has pointed out that the partic-ipants were more accepting of the robots if their perceivedgender of a robot conformed to their occupation’s gender rolestereotypes (e.g., male security robots or female healthcarerobots). However, perceived trust of the social robots was
Pedestrian Detection(YOLO) Detected BBox . . .
Pedestrian Prediction(FRVO) Feature Extraction(DeepSort) Feature MatchingPredicted BBox . . .
BBox Buffer Compute CrowdsMapping(LeGO LOAM) MapPatrol Planning Crowd-Aware RoutingNDT LocalizationCrowdMoveQuery Response
SaveSave Load LoadGoal
Velocity Command
Load
Voice Command
Fig. 2: Our system contains functional modules of tracking, mapping, localization, patrol planning, routing, and motion planning. Trackingmodule uses YOLO [23] and F-RVO [4] to extract similar detected objects of consecutive frames and to keep track of people. Mappingis achieved by using LeGO-LOAM algorithm which is based on 3D Lidar sensor. For localization, we used NDT localization algorithm tomatch lidar data and localize robot in the generated map. According to the detected crowds and map, crowd-aware routing algorithm andpatrol planning algorithm would help robot to determine current target to approach. With all information needed for motion planning, anend-to-end algorithm, CrowdMove, is used to drive robot toward the goal position. During the approaching, if the robot detects its distanceto the crowds is lower than 5 meters, it starts to play a recorded vocal command to remind people to keep a proper social distance. not influenced by gender-occupational role conformity [34].In contrast, Kuchenbrandt et al. [35] found that participants,regardless of gender, evaluated the male and female robots asequally competent while performing a stereotypically femaletask but, in the context of a stereotypically male task, thefemale robot was rated as more competent compared to themale robot. Another study examining the effects of robotgender on human behavior found that participants were morelikely to rate the robot of the opposite gender as more credible,trustworthy, and engaging [36]. Thus, the effects of users androbot attributes, as well as gender-role stereotypes, are stillopen questions. III. H
ARDWARE P LATFORM
First, we will introduce the hardware setup of our surveil-lance robot, which includes three components as shown inFigure 1: the mobile platform, the perception sensor-kit, andthe computational platform. • Mobile Platform : We deploy the Laikago (a dog-likelegged robot) as our mobile platform for navigating incomplex urban environments. Comparing to the wheeledrobot, the legged robot has superiority on traversabilityand thus is more suitable for uneven and unstructuredurban scenarios with stairs and bumps. • Perception Sensor-Kit : To effectively detect and trackpedestrians, we mount four color cameras evenly in thehorizontal plane of the robot. Each camera is equippedwith a short focal lens with the horizontal FOV of o , and thus a combination of four cameras can almostcover all directions around the robot. Moreover, forbetter spatial perception, we use a robosense 3D LiDARwith 16 channels to measure the social distance betweenpedestrians. The 3D LiDAR also serves the navigationapplications in mapping, localization, and collision avoid-ance. • Computational Platform : Two on-board computers aremounted to process the aforementioned sensor data fordifferent tasks. We use NVIDIA Jetson AGX Xavier asthe vision computational module that supports a maxi-mum of six lanes CSI cameras as the input and uses 512CUDA cores to GPU-accelerate the processing of imagescaptured by the cameras. Since other tasks like mappingand localization would mostly consume CPU resources,we also deploy an Intel NUC with Intel i5 8259U CPU.These two computers are connected by wired network,and the processed data is shared by Robotic OperatingSystem (ROS).IV. S
OCIAL D ISTANCING D ETECTION
The tracking algorithm used in our system composes ofobject detection, bounding box prediction, feature extraction,and sparse feature matching. We use YOLO to detect pedestri-ans, and match sparse features with help of motion modelingalgorithm F-RVO to update the traces of pedestrians. (a) Mapping: LeGO-LOAM (b) Potential directions in each cross and cornerFig. 3: (a) We used LeGO-LOAM for mapping. The blue arrows in (b) shows the potential directions in each cross and corner that robotcan go. Based on the information of the map and current position, the patrol algorithm chooses for the robot a navigation direction with themaximum probability of crowd appearance.
A. F-RVO
Modeling pedestrians behavior in crowds from the frontview is challenging, not only because of the non-linear var-ied motions (turning shoulder, side walking, back stepping,etc. [37]), but also due to the occlusions that front view mayencounter. In this work, we use a velocity-obstacle basedalgorithm, F-RVO [4], to model the pedestrians motion.In F-RVO, each pedestrian, p i , is represented by an 8-dimensinal vector: Ψ t = [ x, v, v pref , l, w ] , where x is thecurrent position, v is the velocity, v pref is the preferredvelocity that we assume people would prefer to walk along thefront direction. l and w are the length and width of human’sshoulder. For each frame τ , a half-plane constraint is used todetermine the range parameter in F-RVO. Within the range,each pedestrian p i has an area of velocity obstacle V O τp i | p j with respect to another neighbor pedestrian p j . The convexregion of velocity obstacles considering all neighbors can thenbe computed as: F RV O τp i = (cid:91) p j ∈ H i V O τp i | p j , (1)where H i is the set of all neighbors of pedestrian p i . Out ofthe velocity obstacle area, the best velocity is chosen with thenearest distance to preferred velocity v pref : v best = arg min v (cid:107) v − v pref (cid:107) , (2)where v / ∈ F RV O τp i . B. DensePeds
The tracking algorithm, DensePeds, includes three compo-nents to track pedestrians: object detection, feature extraction,and feature matching, as shown in upside of Figure 2. Ineach time step, We firstly use YOLO to detect pedestrians andgenerate bounding boxes for them. These detected pedestriansmake a set P . Then we use F-RVO to predict another set ofbounding boxes around the pedestrians p i ∈ P . Given the bounding boxes computed in two adjacent time steps, we useDeepSort CNN [7] to extract binary feature vectors from thesub-images as determined by the bounding boxes. Then weperform matching over these sparse features to find in frame t + 1 the best matched pedestrians of frame t and assignedIDs to the pedestrians accordingly. In particular, the sparsefeatures are matched in two steps. First, we find the mostsimilar detected pedestrian of a predicted pedestrian using thecosine metric, i.e., h ∗ j = arg min h j { d ( f p i , f h j ) | p i ∈ P, h j ∈ H i } (3)where d ( · , · ) is the cosine metric, f ( · ) is the feature extractionfunction, p i is one pedestrian in the set P of all pedestrians ina frame, h j is one detected pedestrian in the set H i , which isthe set of detected neighbors around the pedestrian p i . In thesecond step, we maximize the IoU overlap, i.e., the overlappedarea between predicted boxes and original YOLO-detectedboxes, (cid:15) ( i, j ) = B p i ∩ B h j B p i ∪ B h j , (4)where B p i and B h j are the bounding boxes around p i and h j respectively. Matching a set of detected pedestrians to a setof predicted pedestrians with maximum overlap eventually be-comes a max weight matching problem over the matrix (cid:15) ( i, j ) ,which can be accelerated using the Hungarian algorithm [38].According to the computed bounding boxes, we can roughlyestimate the range and bearing information between the robotand pedestrians. To more accurately estimate the crowds, wereproject the bounding boxes to the LiDAR coordinate toquery the depth of each pedestrian. The RANSAC algorithm[39] is applied to filter out the possible outlier points. Ifthere is a large inconsistency between LiDAR and visualestimates due to the occupation between pedestrians, the visualestimates would be trusted. Finally, we obtain the socialdistance between pedestrians. Fig. 4: Multi-robot multi-scenario training environments in the Stagesimulator
V. N
AVIGATION IN U RBAN
In this section, we will introduce the autonomous navigationalgorithm for urban scenarios. We implement the mappingand localization function by the state-of-the-art LiDAR-basedapproaches. The navigation framework is formulated as thehierarchical structure. In particular, we develop a learning-based collision avoidance algorithm for local planning, anduse global planning to plan trajectories for robots to patrol.In addition, we will describe the routing algorithm enablinga robot to effectively select a crowded region to approach inorder to accomplish the patrol.
A. Mapping and Localization
Since LiDAR-based SLAM approaches have been welldeveloped in recent years, we are not going to develop anew SLAM approach for this paper. To achieve the bestperformance for this legged robot, we choose the LeGO-LOAM algorithm, which is a light-weighted system and isoptimized for the grounded platform [40]. The generated mapis shown in Figure 3a.After the robot obtained the 3D point cloud map aboutthe scenario, the Normal Distributions Transform (NDT) scanmatching algorithm is used for localization [41], which havebeen demonstrated in [42] to be able to provide more reliableresult than other matching methods such as Iterative ClosestPoints [43].Although we can compute the 3D point cloud map and therobot’s localization, it is not easy for the robot to determinethe traversable region in the 2D plane. Therefore, we transformthe 3D point cloud to the 2D laser scan, by taking the closestpoint within the certain height as a 2D laser point. Note that,during the navigation the robot may encounter uneven terrainslike stairs or steps. Thus, the transform ignores the point cloudon the ground plane by filtering out the cloud points lower than
30 cm . After the transform, we obtain a 2D occupancy mapfor the following navigation algorithm as shown in Figure 3b.
B. Patrol and Routing
Based on the generated map and current position, we pro-posed a patrol planning algorithm to navigate robot traveling around the mapped area. As shown in Figure 3, in differentcrosses and corners, the robot would choose different navi-gation directions optimal for social distancing. In particular,the robot would prefer the direction where there is a highprobability that a crowd would appear.When the robot detected gathered crowds, it would suspendthe patrol algorithm and switch to the routing algorithm to findan optimal way to approach the crowds. Considering the timeconstraints and size of crowds, we propose a crowd-awarerouting algorithm based on the depth-first search method tofind a sequence of intermediate waypoints for the robot tofollow.We formulate the routing problem as follows. Assume thatthere are N groups of people within the robot’s perceptionrange. Each crowd is denoted as a node n i , with its specifictime-window constraint t i , and its relative location to the robot.Each crowd is assigned a weight w i according to the numberof persons in the group. The routing algorithm aims at findingan optimal path for the robot to approach as many crowds aspossible with the least energy consumption. The optimizationobjective is: cost = (cid:88) p j ∈ p c e j − (cid:88) i ∈ N w i − n c , (5)where p c is the current trajectory which contains a set of pointsand edges denoting the positions of crowds and the pathsconnecting them. Each edge p j ∈ p c between two positionshas the energy cost e j . The number of crowds explored in p c is denoted as n c .Given the directions and positions of the crowds afterrouting algorithm, we implement the SBPL lattice planner [44]to generate a smooth patrol route passing through these way-points. C. Learning-based Collision Avoidance
During patrol, the robot will not only encounter the staticobstacles, but also interact with moving pedestrians. Forthis case, we deployed the learning-based collision avoidanceapproach, CrowdMove [3], for robotic navigation in crowds.The main training framework refers to our previouswork [30], which takes a 2D laser scan as the input andoutputs the velocity command. The multiple training scenariosare designed with multiple robots in the Stage simulator asshown in Figure 4. We introduce the centralized learning,decentralized execution training paradigm, which shares thesame navigation policy during the training. Then, we obtain amulti-robot collision avoidance policy with zero communica-tion. Furthermore, we validate that the trained policy can betransferred from the simulation to the real world without anyre-tuning, and it is also suitable for the single robot navigationin crowds [31], [32]. To make the training framework workfor our hardware platform, we take the transformed laser scanwhich represents the local traversable area as the input.VI. V
OICE I NTERACTION
In our surveillance scenario, we use verbal cues to sendsuggestions from robot to human. As we mentioned before, the user’s gender and the robot’s gender may influence theuser’s acceptance and trust in the robot. Thus, to reach aneffective surveillance result, we gave our robot four types ofgendered voice and designed a user study to select the bestone. In this section, we introduce the study for investigating(1) the user’s gender-based effects of the autonomous robotand, (2) user’s attitude, acceptance, trust, and perceived trustthrough robot with different voices.
A. Method1) Gender of the robot:
We manipulated the gender ofthe robot through non-verbal cues by changing the vocalcharacteristics. Because we aim to find the robot voice withbest performance, the voice selection is not strictly limitedto robot gender effects. In this experiment, we prepared fourtypes of voices: three gendered voices and a child voice. Thegendered voices include a computer-generated neutral voice,a male and a female recorded by real adult human, a childvoice by a girl.
2) Procedures:
Among the various issues in human-robotinteraction, trust was nominated as one of the primary factorsto be considered. In this particular task, trust is performedas how much a human would follow the advice sent bythe surveillance robot. This factor would crucially influencethe performance of the robot. To better measure the users’experience of the robot, we suggest four dependent measureswhich include the users’ attitude towards the robot, perceivedtrust, and acceptance of the robot. The details of the measuresare shown in Table I.As part of a larger study investigating the users’ perceptionsin an autonomous surveillance robot, the participants filled outa survey measuring the factors shown in Tabel I. Each measurewas assessed on a 5-point Likert scale (1 = strongly disagree,5 = strongly agree).This experiment was done in between-subject mode tominimize the learning and transfer across conditions. Eachparticipant in this study viewed two videos and then respondedto survey items related to the videos. Both of the videosdemonstrate the same scenario with the same robot voice. Onevideo was from a third-person perspective, where the robot iswalking towards the crowds while asking them to keep thesocial distance. The other video was recorded from a first-person perspective where the robot is walking towards thecamera while asking the human to keep the social distance.For both scenarios, the robot starts to play the voice at about5 meters away from the crowd. The screenshots of the twovideos are shown in figure 5. In this way, a total of 8 videoswere recorded, which are 2 perspectives times 4 types of robotvoices. For each scenario, we add a description “The robotshown in the videos is a surveillance robot working on keepinga low density of humans during COVID-19. When the robotfinds a crowd, he/she/it will walk toward the crowd whileasking them to keep a proper social distance. Please watchthese two videos, and imagine you were one of the humansin the video, then answer the following questions.”
3) Participants:
A total of 218 adults (119 males; 99females) between 20-55 years old (M=29.49, SD=12.02)
Fig. 5: The screenshots of two videos in the questionnaire. Top: third-person perspective; bottom: first-person perspective. participated in the between-subject experiment. Participantswere mostly students and staff from the Southern Universityof Science and Technology. The participants were recruitedthrough the posters and links shared in a social media app.Each participant needs to read and sign a consent form beforethey start the questionnaire.
B. Data analyses
A manipulation check was performed to ensure that therobots could manifest gender and age successfully. The per-ceived gender was measured through a sliding bar with 0 themost femininity and 100 the most masculinity. The perceivedage was measured through a sliding bar between 5 to 70.The one-way ANOVA showed that participants perceivedmale voice more masculine ( M = 76 . ), female voicemore feminine ( M = 49 . ) an neutral voice in the middle( M = 64 . ). The F and p value is F = 9 . and p < . . The participants also significantly perceivedrobot with child’s voice ( M = 20 . ) younger than others( M = 28 . , p = 0 . ).We calculated Cronbach’s alpha values to assess the inter-nal consistency of each psychometric measure. The reportedalpha values were between 0.8-0.9, indicating that the itemshave relatively high internal consistency. To calculate thesignificance of user gender and robot voice type effect, aone-way ANOVA was conducted. The robot voice and usergender were treated as independent variables. For factorsreached significant differences according to conditions, we TABLE I: Dependent measures in the user questionnaire
Factors
TABLE II: Means and standard deviations of all the measures
Male voice Female voice Neutral voice Child voiceoverall(n=54) male(n=23) female(n=31) overall(n=53) male(n=31) female(n=22) overall(n=56) male(n=31) female(n=25) overall(n=55) male(n=34) female(n=21)Perceived ability 3.90(0.78) 3.62(0.96) 4.10(0.56) 4.05(0.81) 3.98(0.82) 4.15(0.80) 4.06(0.77) 3.91(0.79) 4.24(0.72) 4.15(0.66) 4.12(0.69) 4.19(0.64)Acceptance 3.67(0.98) 3.59(1.23) 3.72(0.76) 4.06(0.80) 4.05(0.83) 4.08(0.78) 4.08(0.73) 3.89(0.74) 4.32(0.65) 4.07(0.72) 4.02(0.73) 4.14(0.70)Perceived trust 3.91(0.60) 3.86(0.78) 3.95(0.43) 4.09(0.73) 4.02(0.72) 4.18(0.73) 4.20(0.67) 4.08(0.69) 4.36(0.62) 4.13(0.72) 4.10(0.77) 4.19(0.66)Attitude toward robots 3.96(0.98) 3.78(1.10) 4.10(0.88) 4.25(0.79) 4.21(0.74) 4.32(0.87) 4.31(0.66) 4.16(0.68) 4.50(0.61) 4.15(0.73) 4.16(0.71) 4.14(0.76)
TABLE III: F value and significance of robot voice effect and user gender effect
Robot voice effect User gender effectoverall male femaleF p F p F p F pPerceived ability 1.027 0.382 1.761 0.158 0.221 0.882 5.160 0.024**Acceptance 3.362 0.020** 1.455 0.231 3.404 0.021** 1.285 0.258Perceived trust 1.866 0.136 0.563 0.640 2.214 0.092* 1.939 0.165Attitude toward robots 2.017 0.113 1.534 0.209 1.397 0.248 2.071 0.152 * p < . , ** p < . used the least significant difference (LSD) to make a pairwisecomparison. VII. E XPERIMENTS
In this section, we first validate the effectiveness of theproposed approach individually. Then, we integrate all themodules to realize the autonomous surveillance robot. Tofurther investigate the performance of surveillance robot onpromoting social distancing, we conduct some real-worldexperiments in the end.
A. Crowd Gathering Detection
We first record vision and LiDAR data to better analyzeand tune the social distancing detection system. The recordeddataset includes a wide variety of pedestrian group behaviors,such as walking, standing, gathering, and scattering.Crowd gathering is not easy to be well quantified, especiallythe occlusion between pedestrians makes the robot difficultor even impossible to accurately acquire the location of eachpedestrian. To detect each possible crowd gathering, we estab-lish a graph-based pedestrian network called social graph , withone example illustrated in Figure 6a. In the social graph, eachnode represents the pedestrian’s position. The green, yellowand red edges represent the safe, warning, and dangeroussocial distance, respectively. We connect the nodes betweenred and yellow edges into a subgraph called the crowd graph , which is considered as possible crowd gathering. In this way,we can reduce the dependence of the crowd gathering detectionon the accuracy of estimating pedestrian positions.
B. Navigation in Urban
The urban navigation would mainly encounter two chal-lenges, the unstructured environments and the dynamic ob-stacles. Thanks to the superior mobility of the quadrupedplatform, our robot can navigate over uneven terrains suchas steps without extra visual estimation effort as shown inFigure 7 and thus can handle unstructured environments easily.To validate the dynamic collision avoidance performanceamong pedestrians, we create a crowded and narrow indoorscenario in the lab, as shown in Figure 8. In this experiment,the robot is required to perform tasks of tracking a specifiedtarget (a bone in this work). We install in the lab several ultra-wide band (UWB) tags accounting for indoor localization.During the experiments each lasting about 30 minutes, therobot dog mounted with a 3D LiDAR can achieve nearly zerocollision in this scenario. This experiment indicates that ourlearning-based collision avoidance policy can be successfullytransferred and deployed to the real-world robotic dog.
C. Voice Preference
Table II shows the means and standard deviations of allmeasures according to different robot voice types and user
Safe Edge Warning EdgeDangerous EdgeOut of the FOV Crowd Graph (a) Visualization of the social distancing detection
FOV of the right cameraPossible crowd gathering (b) Bird view of the scenarioFig. 6: Illustration of the crowd gathering detection within the right camera’s field of view (FOV). Although the estimated position ofpedestrians is not very accurate, we can still detect possible crowd gatherings by establishing the crowd subgraph.Fig. 7: The legged robot can traverse in uneven terrains.
Tracking target(a bone)
Fig. 8: The demonstration of the dynamic collision avoidance experi-ments. We arranged 6 moving pedestrians in this scenarios about × in size. genders. The score of each factor was calculated by averagingboth/all the related items. It can be seen that the male voicetype got the lowest average score on all the measures. Also,female users marked the highest on all the measures for neutralvoice type.Table III demonstrates the F and p-value. The result showsthat for male users, there is no significant result amongall different robot voice types while for female users, therobot voice type significantly influences the user’s acceptance ( p = 0 . ) and perceived trust ( p = 0 . ). To findwhich condition differs for female users, We used the leastsignificant difference (LSD) to make a pairwise comparisonbetween different robot voice types. Surprisingly, For femaleusers, the acceptance, perceived trust, and attitude towardrobot in neutral robot voice condition are higher than otherrobot voice condition, especially for male voice condition( p = 0 . , . , . respectively).We also compare the effects of users’ genders in differentconditions. It is found that female user has higher perceivedability than male users ( p = 0 . ), especially in male voicecondition ( p = 0 . ). In the neutral voice condition, female’sacceptance and attitude toward robots are significantly higherthan male’s ( p = 0 . and p = 0 . respectively).There is no significant difference for male users markingsaccording to different robot voice types. However, it is quitesurprising that the female users marked very high for theneutral robot voice. In addition, surveillance should be amasculine job but both male and female users marked all fourfactors the lowest in male voice condition. Therefore, we donot suggest the usage of the male robot voice. Considering themeans of the four factors among all robot voice conditions,we selected the neutral voice for our surveillance robot. D. Real-world experiment on Promoting Social Distancing
Finally, we integrate all the above modules together, andinvestigate whether the robot can navigate in the complexurban environments with satisfiable social distancing effec-tiveness without terrifying general citizens. The real-worldexperiment was conducted in two public areas including auniversity campus and a park. Figure 9 shows some examplesfrom the real world experiment.The result shows that our robot successfully fulfills the taskof social distancing. For people who have been interacted withour robot, about half of them followed the robot suggestions.
Fig. 9: Examples from the real-world experiment. The top and bottom images describes two different scenarios. Left: The robot detectedand approached the crowds, then persuaded them to keep social distance. Right: The crowds density decreased.
For the other people, most of them glanced at the robot andthen just walked away, some of them stopped and looked at therobot. It’s worth to notice that at the time of our experiment,there were no existing COVID-19 patients in the testing city,which tends to reduce the pedestrian’s compliance with theverbal social distancing commands. During the experiment, weselected some people randomly, then asked them about theirattitude towards the robot and why they followed/didn’t followthe robot’s advice. Some people reported that they felt it is agreat idea to use the surveillance robot and they thought therobot’s advice is reasonable. Besides, many people reportedthe robot looks like it came from the world of science fictionso they were very curious about the robot. However, somepeople felt the robot is not friendly enough so they just wantedto walk away. For the people who ignored the robot’s advice,most of them said that the pandemic is not severe so they feltit’s unnecessary to keep the distance.VIII. C
ONCLUSION
In the context of the COVID-19 pandemic, we developthe autonomous surveillance robot system to promote so-cial distancing. The robot system is mainly composed ofsocial distance detection, urban navigation, and intelligentvoice interaction. The legged robot shows good adaptationto different terrain so that they can work well in human lifescenarios. The real-world experiment also demonstrates ourrobot successfully keeps human’s social distance. In this end,we successfully deploy the system in a real environment toprevent the spread of COVID-19. R arXiv preprint arXiv:2008.06585 , 2020.[3] T. Fan, X. Cheng, J. Pan, D. Manocha, and R. Yang, “Crowdmove:Autonomous mapless navigation in crowded scenarios,” arXiv preprintarXiv:1807.07870 , 2018.[4] R. Chandra, U. Bhattacharya, A. Bera, and D. Manocha, “Densepeds:Pedestrian tracking in dense crowds using front-rvo and sparse features,” arXiv preprint arXiv:1906.10313 , 2019.[5] R. Chandra, U. Bhattacharya, C. Roncal, A. Bera, and D. Manocha,“Robusttp: End-to-end trajectory prediction for heterogeneous road-agents in dense traffic with noisy sensor inputs,” in
ACM ComputerScience in Cars Symposium , 2019, pp. 1–9.[6] K. Fang, Y. Xiang, X. Li, and S. Savarese, “Recurrent autoregressivenetworks for online multi-object tracking,” in . IEEE, 2018, pp.466–475.[7] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtimetracking with a deep association metric,” in . IEEE, 2017, pp. 3645–3649.[8] G. Antonini, S. V. Martinez, M. Bierlaire, and J. P. Thiran, “Behavioralpriors for detection and tracking of pedestrians in video sequences,”
International Journal of Computer Vision , vol. 69, no. 2, pp. 159–180,2006.[9] T. Robin, G. Antonini, M. Bierlaire, and J. Cruz, “Specification,estimation and validation of a pedestrian walking behavior model,”
Transportation Research Part B: Methodological , vol. 43, no. 1, pp.36–56, 2009.[10] S.-Y. Chung and H.-P. Huang, “A mobile robot that understands pedes-trian spatial behaviors,” in . IEEE, 2010, pp. 5861–5866.[11] D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,”
Physical review E , vol. 51, no. 5, p. 4282, 1995. [12] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detec-tion using social force model,” in . IEEE, 2009, pp. 935–942.[13] S. Pellegrini, A. Ess, K. Schindler, and L. Van Gool, “You’ll never walkalone: Modeling social behavior for multi-target tracking,” in . IEEE, 2009, pp.261–268.[14] S. Pellegrini, A. Ess, and L. Van Gool, “Improving data association byjoint modeling of pedestrian trajectories and groupings,” in Europeanconference on computer vision . Springer, 2010, pp. 452–465.[15] W. Choi, K. Shahid, and S. Savarese, “What are they doing?: Collectiveactivity classification using spatio-temporal relationship among people,”in . IEEE, 2009, pp. 1282–1289.[16] K. Yamaguchi, A. C. Berg, L. E. Ortiz, and T. L. Berg, “Who are youwith and where are you going?” in
CVPR 2011 . IEEE, 2011, pp.1345–1352.[17] H. Sheng, L. Hao, J. Chen, Y. Zhang, and W. Ke, “Robust local effectivematching model for multi-target tracking,” in
Pacific Rim Conference onMultimedia . Springer, 2017, pp. 233–243.[18] S. Hong, T. You, S. Kwak, and B. Han, “Online tracking by learningdiscriminative saliency map with convolutional neural network,” in
International conference on machine learning , 2015, pp. 597–606.[19] C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang, “Hierarchical con-volutional features for visual tracking,” in
Proceedings of the IEEEinternational conference on computer vision , 2015, pp. 3074–3082.[20] L. Wang, W. Ouyang, X. Wang, and H. Lu, “Stct: Sequentially trainingconvolutional networks for visual tracking,” in
Proceedings of the IEEEconference on computer vision and pattern recognition , 2016, pp. 1373–1381.[21] Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu, “Onlinemulti-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism,” in
Proceedings of the IEEE Interna-tional Conference on Computer Vision , 2017, pp. 4836–4845.[22] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick, “Mask r-cnn,” in
IEEEinternational conference on computer vision , 2017, pp. 2961–2969.[23] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only lookonce: Unified, real-time object detection,” in
Proceedings of the IEEEconference on computer vision and pattern recognition , 2016, pp. 779–788.[24] J. Van den Berg, M. Lin, and D. Manocha, “Reciprocal velocity obstaclesfor real-time multi-agent navigation,” in
IEEE International Conferenceon Robotics and Automation , 2008, pp. 1928–1935.[25] J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” in
Robotics research . Springer, 2011, pp.3–19.[26] P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense,interacting crowds,” in
IROS , 2010, pp. 797–803.[27] P. Trautman, J. Ma, R. M. Murray, and A. Krause, “Robot navigationin dense human crowds: the case for cooperation,” in
ICRA , 2013.[28] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motionplanning with deep reinforcement learning,” in
Proceedings of theIEEE/RSJ International Conference on Intelligent Robots and Systems ,2017, pp. 1343–1350.[29] M. Everett, Y. F. Chen, and J. P. How, “Motion planning amongdynamic, decision-making agents with deep reinforcement learning,” in
Proceedings of the IEEE/RSJ International Conference on IntelligentRobots and Systems , 2018, pp. 3052–3059.[30] P. Long, T. Fan, X. Liao, W. Liu, H. Zhang, and J. Pan, “Towardsoptimally decentralized multi-robot collision avoidance via deep re-inforcement learning,” in
In Proceedings of the IEEE InternationalConference on Robotics and Automation , 2018, pp. 6252–6259.[31] T. Fan, P. Long, W. Liu, and J. Pan, “Distributed multi-robot collisionavoidance via deep reinforcement learning for navigation in complexscenarios,”
The International Journal of Robotics Research , 2020.[32] T. Fan, X. Cheng, J. Pan, P. Long, W. Liu, R. Yang, and D. Manocha,“Getting robots unfrozen and unlost in dense pedestrian crowds,”
IEEERobotics and Automation Letters , vol. 4, no. 2, pp. 1178–1185, 2019.[33] N. L. Muscanell and R. E. Guadagno, “Make new friends or keep the old:Gender and personality differences in social networking use,”
Computersin Human Behavior , vol. 28, no. 1, pp. 107–112, 2012.[34] B. Tay, Y. Jung, and T. Park, “When stereotypes meet robots: the double-edge sword of robot gender and personality in human–robot interaction,”
Computers in Human Behavior , vol. 38, pp. 75–84, 2014.[35] D. Kuchenbrandt, M. H¨aring, J. Eichberg, F. Eyssel, and E. Andr´e,“Keep an eye on the task! how gender typicality of tasks influence human–robot interactions,”
International Journal of Social Robotics ,vol. 6, no. 3, pp. 417–427, 2014.[36] M. Siegel, C. Breazeal, and M. I. Norton, “Persuasive robotics: Theinfluence of robot gender on human behavior,” in . IEEE,2009, pp. 2563–2568.[37] A. Best, S. Narang, and D. Manocha, “Real-time reciprocal collisionavoidance with elliptical agents,” in . IEEE, 2016, pp. 298–305.[38] H. W. Kuhn, “The hungarian method for the assignment problem,”
Navalresearch logistics quarterly , vol. 2, no. 1-2, pp. 83–97, 1955.[39] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigmfor model fitting with applications to image analysis and automatedcartography,”
Communications of the ACM , vol. 24, no. 6, pp. 381–395,1981.[40] T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimizedlidar odometry and mapping on variable terrain,” in ,2018, pp. 4758–4765.[41] M. Magnusson, A. Lilienthal, and T. Duckett, “Scan registration forautonomous mining vehicles using 3d-ndt,”
Journal of Field Robotics ,vol. 24, no. 10, pp. 803–827, 2007.[42] M. Magnusson, A. Nuchter, C. Lorken, A. J. Lilienthal, and J. Hertzberg,“Evaluation of 3d registration reliability and speed-a comparison oficp and ndt,” in . IEEE, 2009, pp. 3907–3912.[43] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,”in
Sensor fusion IV: control paradigms and data structures , vol. 1611.International Society for Optics and Photonics, 1992, pp. 586–606.[44] M. Likhachev and D. Ferguson, “Planning long dynamically feasiblemaneuvers for autonomous vehicles,”