[PDF] DARE-SLAM: Degeneracy-Aware and Resilient Loop Closing in Perceptually-Degraded Environments

Abstract

Full PDF

((Accepted for publication in the Journal of Intelligent and Robotic Systems, 2021.)

DARE-SLAM: Degeneracy-Aware and Resilient Loop Closingin Perceptually-Degraded Environments

Kamak Ebadi, Matteo Palieri, Sally Wood, CurtisPadgett, Ali-akbar Agha-mohammadiAbstract

Enabling fully autonomous robots capable of navigating and exploring large-scale, unknown and complex environments has been at the core of robotics research forseveral decades. A key requirement in autonomous exploration is building accurate andconsistent maps of the unknown environment that can be used for reliable navigation. Loopclosure detection, the ability to assert that a robot has returned to a previously visited loca- tion, is crucial for consistent mapping as it reduces the drift caused by error accumulation inthe estimated robot trajectory. Moreover, in multi-robot systems, loop closures enable merg-ing local maps obtained by a team of robots into a consistent global map of the environment.In this paper, we present a degeneracy-aware and drift-resilient loop closing method to im-prove place recognition and resolve 3D location ambiguities for simultaneous localizationand mapping (SLAM) in GPS-denied, large-scale and perceptually-degraded environments.More speciﬁcally, we focus on SLAM in subterranean environments (e.g., lava tubes, caves,and mines) that represent examples of complex and ambiguous environments where currentmethods have inadequate performance. The ﬁrst contribution of this paper is a degeneracy-aware lidar-based SLAM front-end to determine the observability and level of geometricdegeneracy in an unknown environment. Using this crucial capability, ambiguous and un-observable areas in an unknown environment are determined and excluded from the searchfor loop closures to avoid distortions of the entire map as the result of spurious or inaccurateloop closures. The second contribution of this paper is a drift-resilient loop closing pipelinethat exploits the salient 2D and 3D features extracted from lidar point cloud data to enable a

The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under acontract with the National Aeronautics and Space Administration. This work was partially funded by theDefense Advanced Research Projects Agency (DARPA). Government Sponsorship acknowledged. ©2020All rights reserved.Kamak EbadiNASA Jet Propulsion Laboratory, California Institute of Technology - Santa Clara UniversityE-mail: [email protected] PalieriNASA Jet Propulsion Laboratory, California Institute of Technology - Polytechnic University of BariSally WoodDepartment of Electrical and Computer Engineering - Santa Clara UniversityCurtis Padgett, and Ali-akbar Agha-mohammadiNASA Jet Propulsion Laboratory - California Institute of Technology a r X i v : . [ c s . R O ] F e b Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi robust multi-stage loop closing capability. Unlike methods that perform the search for loopclosures locally to compensate for the ambiguity and high computational cost associatedwith lidar-based loop closures in large-scale environments, our proposed method relies on arapid pre-matching step that enables the search over the entire robot trajectory, and thus, itis not affected by drift and accumulation of errors in robot trajectory. We present extensiveevaluation and analysis of performance and robustness, and provide comparison of local-ization and mapping results with the state-of-the-art methods in a variety of extreme andperceptually-degraded underground mines across the United States.

Keywords

Degeneracy-aware SLAM · navigation in perceptually-degraded environments · saliency-based loop closing · map alignment and merging Mobile robots rely on a model of the environment for navigation, collision avoidance andpath planning. In GPS-denied and extreme unknown environments where no prior map of theenvironment is available, robots use the onboard sensing to construct locally accurate mapsthat can be used for navigation [1]. Multi-robot systems have been increasing in popularityover the past few decades [2, 3] due to their potential advantages; higher performance andefﬁciency in performing spatially distributed tasks such as exploration and mapping a large- scale unknown environment, higher fault-tolerance and information redundancy, and scala-bility. While great progress has been made over the past decades in the ﬁeld of single-robotSLAM [4], extending these approaches to multi-robot systems in large-scale and ambigu-ous environments still remains a challenge. As illustrated in Fig. 1, perceptually-degradedenvironments are characterized by poor illumination and sparsity of salient perceptual andgeometric features. Noisy measurements obtained in these environments lead to greater ac-cumulation of errors in estimated robot trajectory which subsequently makes detection ofintra- and inter-robot loop closures more challenging. This in turn affects the quality andconsistency of constructed global maps. Robust and reliable localization and mapping inthese environments can enable a wide range of terrestrial and planetary applications [5],ranging from disaster relief in hostile environments to robotic exploration of lunar and Mar-tian caves that are of particular interest as they can provide potential habitats for futuremanned space missions [6].SLAM algorithms aim to recover the most probable robot trajectory and map of the en-vironment using the onboard sensing system. Many solutions have been proposed based ondifferent sensing modalities, including vision [7], visual-inertial [8–10], thermal-inertial [11],lidar-inertial [12], range-based SLAM [13], or combination of them [14, 15]. Vision-based

Fig. 1

Examples of challenges introduced by perceptually-degraded subterranean environments to commonlyused perception systems.itle Suppressed Due to Excessive Length 3 localization and place recognition is a popular and well studied problem in the robotics andcomputer vision literature [16], and while great progress has been made in visual loop clo-sure detection [17,18], vision-based methods are faced with many challenges in perceptually-degraded environments including reduced visibility (i.e., inconsistent illumination or lack-thereof, fog, and dust), and change in the appearance, caused by viewing angle variationsbetween different visits. Moreover, sparsity of salient perceptual features can lead to dataassociation ambiguity and perceptual aliasing [19, 20]. Cameras and lidars are often usedin conjunction due to their complementary nature [21, 22]. A 3D Lidar is a popular solu-tion to mapping perceptually-degraded environments, as it does not rely on external lightsources, and can provide range data over a ◦ horizontal ﬁeld of view at a high temporaland spatial sampling rate.Many scan registration methods have been developed for pose estimation from lidardata. Since raw lidar scans typically contain a large number of noisy and redundant points,methods are needed to downsample the point clouds in order to achieve real-time odometryand mapping performance with higher accuracy. Nuchter et al. [23] present a lidar-based6D SLAM algorithm that is used on a mine inspection robot. In order to achieve real-timeperformance, the authors propose a fast ﬁltering method based on combining a median anda reduction ﬁlter to achieve signiﬁcant data reduction in lidar scans while maintaining thesurface structure in subterranean environments. Zhang et al [8, 24] present lidar odome-try and mapping (LOAM) that relies on feature point extraction and matching to achieve real-time performance while minimizing motion distortions. While LOAM achieves goodperformance, it currently does not recognize loop closures which could help reduce the driftin estimated robot trajectory. Ji et al. [25] present a 3D lidar mapping algorithm similar toLOAM that relies on segment based mapping and place recognition [26, 27] in 3D pointclouds to detect loop closures in structured street environments.Shan et al. [28] propose the lightweight and ground-optimized (LeGO-LOAM) algo-rithm, an extension to LOAM that relies on the ground segmentation capability to discardpoints that may represent unreliable features. While their proposed method maintains thelocal consistency of the ground plane between consecutive frames, it does not use a globalground constraint that makes it susceptible to the accumulation of rotational error. In large-scale underground environments with uneven terrain, this could lead to large errors in theestimated robot poses and trajectories. In LeGO-LOAM the search for loop closures is per-formed locally based on the estimated robot poses. This could lead to missed loop closureopportunities when drift in the robot trajectory is signiﬁcant. Hess et al. [29], present Car-tographer with real-time 2D mapping and loop closure capability. By combining a few con-secutive lidar scans into a local submap, loop closures are detected by matching a large setof submaps. Upon detection of loop closure candidates, outlier loop closures are rejected byrelying on the Huber loss in Sparse Pose Adjustment [30].Over the past decades a number of efforts have focused speciﬁcally on localization andmapping in underground mines and tunnels. Thrun et al. [31] propose a SLAM algorithmfor volumetric mapping of large underground mines. In order to reduce the drift in robottrajectories, the method uses a modiﬁed version of the Iterative Closest Point (ICP) [32]algorithm to detect loop closures. Tardioli et al. [33] propose a system for exploration of atunnel using a robot team. The system relies on a feature-based robot LOcalization Module(LOM) that is responsible for global localization by relying on pre-existing two-dimensionallaser-segmentation-based maps. Global localization is achieved by matching observationsof the environment with the map features. In a similar work, Tardioli et al. [34] proposean underground localization algorithm to enable autonomous navigation of a commercialdumper commonly used for underground construction. Localization is achieved by relying Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi on semantic feature recognition in the tunnels to provide local information sufﬁcient for asuccessful localization.Zlot et al. [35] present a 3D SLAM solution consisting of a spinning 2D lidar and anindustrial-grade MEMS IMU to map a km long underground mine. In order to reducethe drift in open-loop trajectory, the authors rely on a global optimization algorithm wherethe entire robot trajectory is optimized given the set of computed constraints. Furthermore,a set of anchor points that are rough locations manually extracted from the 2D mine surveyare used to further reduce the drift. Leingartner et al. [36] evaluate how well off-the-shelfsensors and mapping solutions work in two different ﬁeld experiments, one from a disaster-relief operation in a . km long motorway tunnel, and one from a mapping experiment ina partly closed down tunnel. The authors conclude that despite advances in visual SLAMsystems, lidar-based solutions have a superior localization and mapping performance in darkand perceptually-degraded underground tunnels. Jacobson et al. [37] propose a monocularSLAM system for online localization and mapping in an underground mine with minimalhuman intervention. Their method leverages surveyed mine maps to increase mapping ac-curacy and enable global localization. In order to reduce the drift in the estimated robottrajectory, a vision-based loop closure detection method is used where a new frame is com-pared to all previously obtained frames for place recognition.The earliest laser-based loop closure detection methods were based on registration ofshape of the laser scans [38–40]. Cox et al. [38] proposed a method designed for use in structured ofﬁce and factory environments where points were aligned with line segmentsextracted from a known environment representation. Gutmann et al. [39] developed a scan-to-scan alignment approach that relied on the extraction of line segments from referencescans. Lu et al. [40] improved the point-to-line matching method of Cox with an ICP-basedpoint-to-point matching approach that do not require feature extraction or segmentation.Moreover, similar to feature-based registration of 2D images [41], several methods basedon feature extraction from lidar point cloud have been proposed [8, 42–44] where the scanregistration is achieved through matching the computed features descriptors.Over the past decades the ICP algorithm has been commonly used for lidar scan reg-istration and loop closure detection [28, 45–48]. In order to detect a loop closure, a lidarscan is registered to previously obtained scans to ﬁnd a match. In large-scale or long-termoperations where a large number of lidar scans are obtained over time, not only this methodcan become increasingly computationally expensive, but it can also lead to spurious or in-accurate loop closures, particularly in ambiguous environments. In order to improve theperformance and accuracy of loop closures, a common approach is to constrain the searchfor loop closures to a ﬁxed radius centered at estimated robot poses. While this method iseffective in reducing the computational load and number of spurious loop closures, it couldlead to missed loop closure opportunities when the accumulation of errors in lidar odometryleads to signiﬁcant drift in the robot trajectory.This paper extends our previous work on large-scale autonomous mapping and position-ing (LAMP) [48] in a number of signiﬁcant ways to improve localization and mapping inperceptually-degraded environments. The core contributions of this paper are:1. A real-time degeneracy-aware SLAM front-end to determine the level of geometric de-generacy in unknown environments. Using this crucial capability, ambiguous areas thatcould lead to data association ambiguity and spurious loop closures are identiﬁed and ex-cluded from the search for loop closures. This signiﬁcantly improves the quality and ac-curacy of loop closures because the search for loop closures can be expanded as needed itle Suppressed Due to Excessive Length 5 to account for drift in robot trajectory, while decreasing rather than increasing the prob-ability of spurious loop closures.2. A drift-resilient and pose-invariant loop closing method based on salient 2D and 3D geo-metric features extracted from lidar point cloud data to enable detection of loop closureswith increased robustness and accuracy as compared to traditional geometric methods.The proposed method does not require additional sensors to achieve higher performance,and instead relies on exploiting salient features embedded in 2D occupancy grid mapscommonly used in robot navigation to obtain more information about the spatial con-ﬁguration of the local environment from lidar point cloud data. The method is not de-pendent on the estimated robot poses, a key advantage that makes it invariant to driftin robot trajectories. In the ﬁrst stage of the process, 2D features extracted from salientoccupancy grid maps are used to search for potential loop closures over the entire robottrajectory. This signiﬁcantly reduces the number of missed loop closure opportunities ascompared to geometric methods that perform the search for loop closures locally. In thesecond stage, a geometric veriﬁcation step is used to verify the quality of each loop clo-sure candidate. Subsequently, a back-end equipped with an outlier rejection capabilityperforms the map merging and alignment.3. An extensive evaluation and comparison of performance with state-of-the-art methods isprovided in a variety of extreme subterranean environments. This includes data collectedat the Tunnel Circuit of the DARPA Subterranean Challenge [49] in the Bruceton Safety Research coal mine and Experimental mine in Pittsburgh, PA.The rest of this paper is organized as follows: in Section 2 we present the problem for-mulation. Our lidar-based SLAM architecture which includes the degeneracy-aware SLAMfront-end, the drift-resilient loop closing pipeline, and the back-end is presented in Section 3.Experimental results are presented in Section 4. Finally, Section 5 discusses the conclusion,and future research directions.

In extreme and unknown environments with sparse salient geometric structures, the lidarodometry is challenged with a higher probability of data association ambiguity that couldlead to large errors when estimating the relative robot motion. Robust loop closure detectionis crucial to improve localization and mapping accuracy in these environments. Moreover, ina multi-robot collaborative SLAM system, loop closures are critical to ﬁnd correspondencesbetween the maps obtained by individual robots in order to merge them into a consistentglobal map of the environment.The focus of this work is developing methods for improving location ambiguities andloop closing in extreme environments that do not provide sufﬁcient distinctive information toestablish global position. While our proposed method can be valuable to map alignment andmerging in multi-robot systems, it naturally falls back to loop closing for robust localizationand mapping in a single-robot system. As illustrated in Fig. 2, in order to represent a robottrajectory and the map of the environment, a graph-based formulation is used [4], whereevery node in the graph corresponds to a robot pose, and every edge connecting two nodesexpresses the relative 3D motion between the corresponding poses. We denote the trajectoryof N α poses of robot α by the sequence x α . = [ x α i ] i =1: N α , where x α i . = [ R α i , t α i ] ∈ SE(3) is the robot pose associated with the i -th node in the graph, which includes a rotation R α i ∈ SO(3), and a translation t α i ∈ R . Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi

Fig. 2

Illustration of multi-robot pose graph SLAM for a team of two robots. The yellow nodes correspondto the robot poses, while blue squares represent the relative 3D rigid transformation between the nodes.

The lidar point cloud obtained at each pose is associated with the corresponding node inthe pose graph. Each robot builds an explicit map of the environment by projecting the pointclouds associated with the nodes into a common world coordinate system W . Consideringa centralized multi-robot system, when a robot is within the wireless communication range,it communicates its local pose graph to a base station. Assuming a partial overlap amongareas explored by all robots, the base station is responsible for merging the local graphs toproduce a consistent global map of the environment that can be used for reliable navigationof robots in the future.The set of all raw lidar scans obtained along the trajectory of robot α is denoted by z α = { z α i } i =1: N α , where z α i = { p k } k =1: N z is the i -th lidar scan represented as a collection of3D points p k ∈ R in the local lidar coordinate system L i . We denote the measurementmodel by z α i = h ( x α i , m α ) + w α i , (1)where h ( . ) returns the vector connecting the current lidar pose to a point in the environmentalong each lidar ray, m α represents the actual local environment explored by robot α , and w α i ∼ N (0 , Σ α ) is assumed to be a Gaussian noise with zero-mean and covariance Σ α .In lidar-based state estimation, the odometric estimates are obtained by computing therelative pose transformation between consecutive lidar scans. In this paper, we use the Gen-eralized ICP algorithm [52] to obtain u tt +1 . = [ R tt +1 , t tt +1 ] , an odometric estimate thatencodes the relative 3D motion between two consecutive lidar scans obtained at times t and t +1 . As the lidar operates at a high temporal sampling rate relative to robot motion, the posegraph would be oversampled in time if a new node were instantiated after each new odomet-ric measurement. We utilize a reduced pose graph where a new node is instantiated in thegraph after reaching a minimum odometric displacement ( ◦ rotation or m translation).We refer to the nodes in the reduced graph as key-nodes , and the lidar scans associated withthem as key-scans . The relative 3D motion u ii +1 between poses associated with key-nodes i and i + 1 is obtained from integration of incremental motion estimates between times t and t + n i .Since the lidar measurements are noisy, each lidar-based odometric estimate will includesome error as the result of point cloud matching errors. Since the robot trajectory is built in-crementally from the accumulation of odometric estimates, the accumulation of translationaland angular errors in a sequence of odometric estimates can lead to an unbounded drift inthe estimated robot trajectory for an assumed unlimited time. In order to bound and reducethis drift, our method relies on a degeneracy-aware and saliency-based loop closing method,where we constrain the search for loop closures to only the most observable areas in the un-known environment. The saliency-based loop closing pipeline can be abstractly formulated itle Suppressed Due to Excessive Length 7 as LC g ( LC s ( α i , β j )) = (cid:40) if no loop closed if loop closed , (2)where LC s ( ., . ) and LC g ( ., . ) denote the stages of pre-matching and geometric veriﬁcation based on salient 2D and 3D features extracted from lidar point cloud data respectively.In the ﬁrst stage, 2D occupancy grid maps are used to create bird’s eye views of pointcloud data to extract additional information about the spatial conﬁguration of local scenes.This enables the pre-matching stage LC s ( ., . ) , where putative loop closures are identiﬁed byevaluating the similarity between spatial conﬁguration of fully observable scenes along therobot trajectory. Upon detection of putative loop closures, the geometric veriﬁcation stage LC g ( ., . ) is used to evaluate the quality of each loop closure candidate, and to compute the3D rigid transformation between the pair of lidar scans. When LC g ( LC s ( α i , β j )) = 1 , anedge u α i β j connecting the nodes x α i and x β j is added to the pose graph to represent a loopclosure. When α = β , it represents an intra-robot loop closure found in a local pose graph,otherwise, it represents an inter-robot loop closure found between two local pose graphs.In a single-robot SLAM system, the goal is to estimate the unknown robot trajectoryand the map of the environment from the set of all odometry and loop closure estimates.Let G = (cid:104) x , u , z (cid:105) denote the local pose graph comprised of the robot trajectory x , the setof all odometric and intra-robot loop closure estimates u , and the set of all key-scans z . Since the process happens on the same robot, the robot name is dropped for brevity. Themaximum a-posteriori (MAP) estimate of the robot trajectory can be obtained using PoseGraph Optimization (PGO) [4] as given by ˆ x = argmax x p ( x | u )= argmax x p ( x ) p ( u | x ) , (3)where p ( x ) is the prior probability over x , and p ( u | x ) is the measurement likelihood. Whenno prior knowledge is available over the robot trajectory, the term p ( x ) is assumed to bea uniform distribution and does not affect the result. This will reduce the MAP estimationproblem to the maximum likelihood (ML) estimation as given by ˆ x = argmax x | u | (cid:89) i =1 p ( u i | x i )= argmin x − log (cid:16) | u | (cid:89) i =1 p ( u i | x i ) (cid:17) = argmin x − log (cid:16) N (cid:89) i =1 p ( u ii +1 | x i , x i +1 ) (cid:124) (cid:123)(cid:122) (cid:125) Odometry (cid:89) u ij ∈ u c p ( u ij | x i , x j ) (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) Loop Closure , (4)where N is the total number of nodes in the graph, and u c denotes the set of all loop closureconstraints. Let x j = f ( x i , u ij ) + v i,j be the new pose update using the robot motion modelfrom node i to j , where f ( . ) computes robot’s nonlinear motion between the unknown robotposes x i and x j , and v i,j is assumed to be a zero-mean Gaussian noise with informationmatrix Ω i,j . The ML trajectory estimate can be computed by minimizing the mismatch Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi between the relative pose measurements and the estimated robot poses as given by ˆ x = argmin x (cid:0) N (cid:88) i =1 (cid:13)(cid:13)(cid:13) f ( x i , u ii +1 ) − x i +1 (cid:13)(cid:13)(cid:13) Ω i,i +1 + (cid:88) u ij ∈ u c | (cid:13)(cid:13)(cid:13) f ( x i , u ij ) − x j (cid:13)(cid:13)(cid:13) Ω i,j (cid:1) , (5)where (cid:107) . (cid:107) Ω i,j denotes the squared Mahalanobis distance.To extend this to a centralized multi-robot system, assuming known initial robot poses,a similar process is used on the base station to detect inter-robot loop closures to determinecorrespondences between pose graphs. Once the relative 3D rigid transformations betweenoverlapping parts of robot trajectories are determined, pose graph optimization (3) is used toobtain the maximum a-posteriori (MAP) estimate of the global pose graph. The global mapof the environment can then be constructed by projecting the key-scans into the commonworld coordinate system W . ments to produce odometric estimates and loop closure constraints. The odometric estimatesare obtained by computing the relative 3D motion that best aligns consecutive lidar scans.Fig. 3 provides an overview of the front-end that has three layers of processing to produceodometric estimates: (i) point cloud ﬁltering, (ii) scan-to-scan registration, and (iii) scan-to-submap registration. In the following we provide a short description of each layer.The raw point clouds obtained by each scan are often density-uneven and contain re-dundant and noisy points that can lead to increased computational load and inaccurateodometric estimates using the ICP algorithm. To overcome these challenges, most meth-ods [8, 23, 25, 28, 48] use a subset of the raw point cloud data to estimate the robot motion.Among different point cloud ﬁltering methods, random downsampling, and voxel grid ﬁl-tering [25, 50, 51] are commonly used for point cloud ﬁltering. In random downsamplingmethod, points are randomly sampled with a uniform probability, whereas, in voxel gridﬁltering, a 3D voxel grid is created over the point cloud data, and the points in each voxelare approximated with their centroid value. While these ﬁltering approaches are effective indownsampling the raw point cloud data, they remove points regardless of their importanceand effect on the performance and accuracy of lidar odometry. Fig. 3

Overview of the lidar-based front-end and the local pose graph.itle Suppressed Due to Excessive Length 9

In this work, using the method of Zhang et al. [8], we decimate up to of the pointsin every raw lidar scan by extracting a set of uniformly distributed salient features locatedon sharp edges and planar surfaces. We then use the GICP algorithm, to obtain odometricestimates based on a two-stage scan-to-scan and scan-to-submap registration process as ini-tially introduced in our prior work, LAMP [48]. In the scan-to-scan registration step, theincoming point cloud is aligned to the last obtained lidar scan to ﬁnd a rigid body trans-formation that best aligns the two point clouds. After this approximate localization, thesolution is further reﬁned by registering the incoming point cloud to a submap created fromthe local region of previously collected point cloud data in the pose graph. Given the currentestimated pose of the robot in the world coordinate system W , the incoming lidar scan istransformed to the world coordinate system using the same transformation, and an approx-imate nearest neighbors search is performed to ﬁnd the closest points in the map. A secondpoint cloud registration similar to scan-to-scan registration is used to obtain the odometricestimate u tt +1 . = [ R tt +1 , t tt +1 ] ∈ SE (3) that describes the 3D motion of the robot betweentimes t and t + 1 .Fig. 4 reports the accuracy of lidar odometry by measuring the Relative Pose Error(RPE) as a function of distance traversed using the package for evaluation of odometry andSLAM (EVO) [75] in two perceptually-degraded underground mines. In the Safety Researchmine, the robot autonomously navigates m of the mine tunnels. By relying on scan-to-scan registration, the accumulation of errors results in more than m ( relative position error) average drift in translation for each m of traversed distance. By using the scan-to-submap registration step the RPE is reduced to less than of distance traversed for each m of travel.The Experimental mine presents a more challenging subterranean environment withlong and featureless corridors. During the m autonomous navigation, due to high level ofgeometric degeneracy at multiple locations, accumulation of errors from scan-to-scan reg-istration leads to more than m ( relative position error) of average drift in translationfor each m of distance traversed. By using the scan-to-submap registration step, the esti-mated motion of the robot is further reﬁned and the drift in translation is reduced to m ( relative position error) of average drift for each m of traversed distance. Fig. 5 presentspartial estimated robot trajectories in the Safety Research and Experimental mines usingthe scan-to-scan and scan-to-submap matching methods. The results show that due to lowvertical resolution of the lidar scanner, scan-to-scan matching poorly estimates the angularand translational motion of the robot in the narrow tunnels of the mines. While this leads to Fig. 4

Quantitative evaluation of drift from scan-to-scan and scan-to-submap matching as a function of dis-tance traversed in Bruceton Safety Research and Experimental mines. Each box comprises the RPE valuesranging from the ﬁrst to the third quartile. The median is indicated by the black horizontal bar. The whiskersextend to the farthest data points that are within . times the interquartile range. Outliers are shown as dots.0 Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi Fig. 5

Robot trajectories obtained using scan-to-scan and scan-to-submap registration methods in SafetyResearch mine and Experimental Mine. signiﬁcant drift in the estimated robot trajectories, the scan-to-submap matching step can re-duce the drift by using an additional registration step where the output of the ICP algorithmis further reﬁned. The Experimental mine represents a more challenging environment withmany degenerate scenes due to long, narrow and symmetrical walls. The result show, whileusing the scan-to-submap matching step reduces the error in robot estimates, the drift is stillsubstantially large. This highlights the importance of a drift-resilient loop closure detectionmethod that can reduce the accumulated error in robot trajectory.3.2 Determination of Geometric DegeneracyDetermination of reduced observability and geometric degeneracy is a crucial capabilityto evaluate the reliability of the front-end in reconstructing its full internal state based onmeasurements of the environment. Loss of observability can occur in any odometry system;scarcity of texture and salient features in a vision-based system, lack of thermal variationsin a thermally ﬂat environment (e.g., cold underground mines and caves) in thermal-inertialodometry, sparsity of distinctive geometric structures in a lidar-based front-end, or slipperyterrains in a wheel-inertial odometry system. In lidar odometry, the reliability of odomet-ric estimates is affected by the noise and degeneracy that can arise from the geometricalstructure of the environment. For example, partial observability may occur when a robot tra-verses a tunnel or corridor with long, symmetrical and featureless walls. In this environment,the scan matching optimization process may result in multiple possible solutions leading toinaccurate estimation of forward motion along the direction of the tunnel. itle Suppressed Due to Excessive Length 11

In order to evaluate the level of geometric degeneracy in an unknown environment, wedevelop a method using a distance cost function based on the rigid transformation obtainedfrom the ICP algorithm. From the last iteration of the ICP algorithm, we obtain the set ofall corresponding points P t and P t +1 , and the relative pose measurement u tt +1 . = [ R , t ] between consecutive lidar scans, where R ( θ x , θ y , θ z ) is the 3D rotation deﬁned in terms ofEuler angles roll θ x , pitch θ y , and yaw θ z , and t = [ t x , t y , t z ] is the 3D translation vector.Given u tt +1 that best aligns two consecutive lidar scans, the alignment error between the k th pair of corresponding points p tk and p t +1 k is given by d k = u tt +1 p tk − p t +1 k , (6)where d k = [ d k x , d k y , d k z ] is the misalignment vector. The mean squared distance betweenthe set of all corresponding points P t and P t +1 is given by E ( u tt +1 P t , P t +1 ) = 1 N k N k (cid:88) k =1 (cid:107) d k (cid:107) , (7)where E ( . ) is a scalar-valued cost function, N k is the total number of corresponding pointsfrom the ICP algorithm, and (cid:107) . (cid:107) denotes the L norm. The ICP-based scan matching canbe formulated as an optimization problem that aims to ﬁnd u tt +1 that minimizes the costfunction as given by u tt +1 = argmin ˆ u tt +1 E (ˆ u tt +1 P t , P t +1 ) . (8)Using the displacement vector computed for the k -th pair of corresponding points in (6),a cost function can be deﬁned in terms of d k , as given by ξ k ( u tt +1 ) = d k x + d k y + d k z . (9)Assuming local linearity due to small motion between consecutive lidar scans, we computethe gradient of the cost function in order to propagate errors from sensor noise to uncertaintyof state estimates as given by J t +1 k = ∂ξ k ( u tt +1 ) ∂ ( t x , t y , t z , θ x , θ y , θ z ) (10) = (cid:104) ∂ξ k ( u tt +1 ) ∂t x ∂ξ k ( u tt +1 ) ∂t y ∂ξ k ( u tt +1 ) ∂t z ∂ξ k ( u tt +1 ) ∂θ x ∂ξ k ( u tt +1 ) ∂θ y ∂ξ k ( u tt +1 ) ∂θ z (cid:105) , where J t +1 k is the gradient vector computed for the k th pair of corresponding points at timestep t + 1 . The Hessian of the cost function can be approximated [56] from the sum of outerproducts of gradients for the set of all N k corresponding points, H t +1 = N k (cid:88) k =1 ( J t +1 k ) T J t +1 k . (11) In the eigenvalue decomposition of H t +1 , the eigenvector associated with the smallest eigenvalue represents the least observable direction in terms of the estimated rigid 3D trans-formation. We ﬁnd the relative scale between the most and least observable directions bycomputing the ratio of the maximal and minimal eigenvalues κ t +1 = λ max λ min , (12) where κ t +1 is the condition number of the approximated Hessian H t +1 which is a symmet-ric matrix. Although the eigenvalues will be dependent on general environmental variablessuch as measurement noise or the number of corresponding points in the obtained lidarscans, computing the ratio will remove dependence on environmental variables that scalethe eigenvalues. Thus a large condition number corresponds to higher levels of geometricdegeneracy [57]. This means a relatively large error is associated with one or more linearcombination of parameters of the computed transformation u tt +1 . A small condition numberclose to corresponds to equal observability of all parameters in the computed transforma-tion. In our experiments, the pose is classiﬁed as degenerate if log ( κ t +1 ) ≥ κ th , where κ th is the degeneracy threshold based on expectations of normal variability, and validatedby experimental results. Robot poses and corresponding key-scans that are classiﬁed as de-generate are removed from loop closure consideration to achieve the desired performanceobjective by constraining the search for loop closures to areas with highest level of observ-ability to reduce the probability of false or inaccurate loop closures.Fig. 6 shows the determination of geometric degeneracy in an indoor ofﬁce environmentas a robot navigates a long corridor with ﬂat and symmetric walls. The plot of log of theeigenvalues log ( λ ) shows the response of all six eigenvalues of the approximated Hessianmatrix. As the robot navigates the featureless corridor, in one section of the trajectory asigniﬁcant drop can be seen in the values of the smallest eigenvalue of the Hessian, whilevariations in the rest of eigenvalues are minimal. This leads to an increase in the values of the condition number in (12) as shown in the plot of log ( κ ) values. In this experiment,it was possible to verify that increased values of log ( κ ) corresponded to noisy lidar-basedodometric estimates based on the wheel-inertial odometry. We compare the output of thelidar odometry with the wheel-inertial odometry as the robot navigates the indoor ofﬁce Fig. 6 (a) map of a corridor in an indoor ofﬁce. Several locations are marked with A, B, C, and D to show theresponse of eigenvalues and condition number to different levels of geometric degeneracy in the scene. log λ shows the response of all six eigenvalues of the approximated Hessian as the robot navigates the corridor. log ( κ ) plot shows the log of condition number computed from ratio of the maximal and minimal eigenvalues.The lidar slip plot shows the difference between forward translation of the robot as estimated by the wheel-inertial odometry and lidar odometry at s intervals. The wheel slippage in this experiment is known to benegligible.itle Suppressed Due to Excessive Length 13 environment. Often the wheel odometry is not reliable, but as the robot moves at a low speedand the corridor has carpet ﬂooring, the wheel slippage is known to be negligible. The lidarslip plot shows the difference in the computed forward motion based on the wheel-inertialand lidar odometry at s intervals. The results show values of log ( κ ) > . correspondto the largest discrepancies between the lidar and wheel-inertial odometry. The degeneracymetric κ responds to the geometric degeneracy of the featureless corridor, as the forwardmotion of the robot along the walls becomes unobservable to the lidar odometry: a situationwe call lidar-slip .In order to evaluate the sensitivity and speciﬁcity of geometric degeneracy detector,we rely on Receiver Operating Characteristic (ROC) analysis. A set of lidar scans aremanually labeled such that lidar scans correspond to degenerate scenes. The ROC curveis produced by computing the true positive and false positive rates at various thresholds.Fig. 7 reports the high discriminative ability of geometric degeneracy detector where anarea under curve (AUC) value of . is achieved. Different decision thresholds markedon the plot show the sensitivity of the method to different thresholds. Fig. 7

ROC plot reports the performance of environment degeneracy detector, created by plotting the truepositive rate (TPR) against the false positive rate (FPR) at various log ( κ ) threshold settings. or otherwise known location or landmark. As discussed in Section 1, a common method todetect loop closures in graph-based lidar SLAM is to constrain the search for loop closuresto a ﬁxed space centered at the estimated robot poses. This can be formulated as (cid:13)(cid:13) x α i − x α j (cid:13)(cid:13) < D r , (13)where we use the positional components of robot poses x α i and x α j to compute the Eu-clidean distance (cid:107) . (cid:107) between two robot poses, and D r is the loop closure search radius.Although this is a reasonable constraint, in practice if the drift in robot trajectory exceeds D r , loop closure opportunities will be missed. The impact of drift is illustrated in Fig. 8,where the search for loop closure is performed by registering a key-scan to historic key-scansthat lie inside the loop closure search space. Since the search space has a ﬁxed radius, andis dependent on estimated robot poses, this could lead to missed loop closure opportunitieswhere the drift in robot trajectory is larger than the expected value.Fig. 9 presents the top-down view of the distorted 3D map of the Eagle Mine located inJulian, CA, which highlights the impact of missed loop closing opportunities on the qualityand consistency of constructed maps. While both branches shown in green and blue areidentical as they correspond to the same physical environment, the branches are not mergeddue to missed loop closures. While expanding the radius of the search space can be usefulin improving loop closure detections, an expanded search space along the trajectory linearlyincreases the number of loop closure candidates and subsequently leads to a signiﬁcant increase in the computational complexity associated with lidar scan registration.As shown in Fig. 9, avoiding missed loop closing opportunities can be achieved byexpanding the search radius as needed while strategically reducing the number of nodesconsidered for loop closure. This not only reduces computation, but also avoids distortionsof the map due to poor convergence of the ICP algorithm [60] where the scene geometrydoes not constrain the optimization sufﬁciently. In Fig. 9, the salient features highlightedwith yellow boxes can be used to develop a drift-resilient pre-matching step as described inthe next section. In this section, we describe the proposed multi-stage loop closing method for a single-robotsystem, and then show how it can be extended to a multi-robot SLAM system to enable

Fig. 8

The search for loop closure is performed by registering a key-scan to historic key-scans that lie in-side the loop closure search space. This could lead to missed loop closure opportunities if the drift in robottrajectory is larger than the expected value.itle Suppressed Due to Excessive Length 15

Fig. 9

Partial map of the Eagle mine, Julian, CA, obtained on the base station by merging local pose graphsof two unmanned ground robots. The initial poses of the robots are shown as x α and x β . Due to signiﬁcantdrift in the estimated robot trajectories, loop closure opportunities are missed which has resulted in twoidentical tunnels as shown in blue and green, while both tunnels represent the same physical environment.The areas marked with yellow boxes show salient 2D features that can be captured in the bird’s eye view ofthe lidar point cloud data to enable a pre-matching step for loop closure detection. map alignment and merging on a based station in a centralized SLAM architecture. Fig. 10provides an overview of the proposed multi-stage pipeline. The method consists of threemain layers, namely pre-matching, geometric veriﬁcation and outlier rejection. In the pre- matching step, 2D occupancy grid maps constructed from key-scans are used for assessmentand evaluation of potential loop closures over segments of the mapped environment that aredetermined to be fully observable. In contrast to the BGLC method that relies on a ﬁxedsearch radius in the local neighborhood of each robot pose, the pre-matching step has the Fig. 10

An overview of the saliency-based loop closure detection method.6 Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi key advantage of enabling a fast and global screening of putative loop closures, and thus, itis pose-invariant. The pre-matching step is based on 2D occupancy grid map matching. Theessence of using 2D occupancy grid maps is to capture the spatial conﬁguration of the localscenes by creating bird’s eye views of the lidar point cloud data. This enables the search forloop closures by performing occupancy grid map matching, where the objective is to ﬁndscenes with similar spatial conﬁguration.2D occupancy grid maps were ﬁrst introduced by A. Elfes in 1985 [61], and haveevolved over time [62] as one of the most common types of map used in robot naviga-tion and path planning. In this work, each occupancy grid map has × cells, andcorresponds to an area of m × m in the physical environment. Each cell stores a corre-sponding occupancy belief b i ( x, y ) ≤ x,y< , representing the estimated probability thata static or stationary object is present in that cell location.In order to construct an occupancy grid map from a key-scan, the ROS 2D costmaptool [63] is used to obtain a slice of robot’s surrounding 3D world from lidar point clouddata. This is achieved by ﬁltering the point cloud to remove the points that comprise theground plane, as well as the points that appear higher than the highest point on the robot.The ﬁltered point cloud is then projected onto a XY plane to construct the occupancy gridmap. A cell is considered as being in one of three possible states: free for b i ( x, y ) < . ,unknown for b i ( x, y ) = 0 . , and occupied for b i ( x, y ) > . . We convert the belief map b i ( x, y ) to a binary map o i ( x, y ) using o i ( x, y ) = (cid:40) if b i ( x, y ) < . if b i ( x, y ) ≥ . , (14)where o i ( x, y ) can be interpreted as a 2D binary map image of the local scene. While loop closure detection is a crucial requirement in single- and multi-robot SLAM sys-tems, it is equally crucial to avoid closing loops in ambiguous areas with high level of geo-metric degeneracy. Attempting loop closures in these areas can lead to detection of false orweak loop closures that can result in catastrophic distortions of the map. In a geometricallydegenerate scene like a long and featureless tunnel with ﬂat walls, the spatial conﬁgurationof the scene captured in 2D occupancy grid maps also lack distinctive features sufﬁcient forlocalization, a condition that we refer to it as lack of visual saliency . Fig. 11 presents exam-ples of occupancy grid maps constructed from lidar scans in an underground tunnel. WhileFig. 11-(a) can lead to perceptual aliasing and data association ambiguity due to lack of vi-sual saliency, Fig. 11-(b-c-d) show maps with sufﬁciently distinctive spatial conﬁgurations

Fig. 11 (a) shows an occupancy grid map with lack of visual saliency. (b), (c) and (d) show occupancy gridmaps with salient visual features.itle Suppressed Due to Excessive Length 17 that can be used for global localization in the pre-matching step. By relying on the degener-acy metric κ in (12), the level of observability of the scene can be analyzed in real-time toremove ambiguous regions as shown in Fig. 11-(a) from loop closure consideration. In thenext section, we develop metrics to determine putative loop closures based on occupancygrid map matching. The search for loop closures can be formulated as a place recognition problem where arobot establishes global localization by registering a salient occupancy grid map obtainedat its current pose to a set of salient occupancy grid maps obtained in the past throughoutits trajectory. This can be interpreted as a special instance of 2D image registration problemwhere occupancy grid maps are viewed as 2D binary map images.Let o α i and o α j denote two map images obtained from key-scans z α i and z α j of robot α . From (2) LC s ( α i , α j ) is the pre-matching step, which evaluates the level of similaritybetween a pair of occupancy grid maps based on feature matching. In order to register twomap images, the set of all salient 2D features are extracted from o α i and o α j . In this paper,the Oriented FAST and Rotated BRIEF (ORB) features by Ethan Rublee et al. [64], is usedover other methods (i.e., SIFT, SURF) due to its key qualities: (i) outstanding speed andperformance, (ii) resistance to image noise, (iii) and rotation invariance. We ﬁnd the set of N corr corresponding features by using the Fast Library for Approximate Nearest Neigh-bors (FLANN) presented by Chanop et al. [65]. FLANN is a fast local approximate nearestneighbors method that is commonly used to match keypoints found between correspondingimages and to compute the set of corresponding feature points. Finally, using the RandomSample Consensus (RANSAC) algorithm [66], a homography ˆ M o αi o αj that best describes the2D geometric transformation between the occupancy grid maps is computed.The objective of the pre-matching step is to identify the most likely loop closure candi-dates along robot’s trajectory based on spatial similarity of the scenes. The correspondenceconﬁdence, ζ i,j = N in N corr , (15)which is the ratio of the inliers N in , to the total number of corresponding points N corr , isoften used as a ﬁtness measure in feature-based image registration. In our experiments, ifthe number of inliers is less than a threshold (i.e., N in ≤ ), the map image is removedfrom loop closure consideration. In a best case scenario, where all corresponding points areinliers, the correspondence conﬁdence ζ i,j = 1 .In perceptually-degraded environments, we can encounter situations where after ﬁndingthe set of inliers using RANSAC, the correspondence conﬁdence score is high even thoughthe match is incorrect. In general, the theoretical breakdown point of all robust estimators,where there is no general guarantee of success in detection of true inliers, is when the per-centage of outliers is more than [67]. As a result, based on the level of self-similarityand ambiguity between map images, the number of selected inliers and accuracy of the es-timated homography can vary. In order to reduce perceptual aliasing and data associationambiguity, we introduce the transformation conﬁdence to identify high values of correspon-dence conﬁdence scores that are unreliable. Let f o αi k and f o αj k denote the k th pair of cor-responding 2D features in map images o α i and o α j , where each feature is deﬁned with its

2D image coordinate. Given the computed homography from RANSAC, the residual errorbetween the k th pair of corresponding feature points is given by r k = (cid:13)(cid:13)(cid:13) ˆ M f o αi k − f o αj k (cid:13)(cid:13)(cid:13) (16)where r k is the Euclidean distance between the k th pair of aligned corresponding points.Using (16), the mean squared error for the set of all N k corresponding points is given by (cid:15) i,j = 1 N k N k (cid:88) k =1 (cid:107) r k (cid:107) . (17)Using the computed mean squared error (cid:15) i,j , we evaluate the quality of the computedhomography between o α i and o α j by computing Λ i,j = 11 + (cid:15) i,j , (18)where Λ i,j is the transformation conﬁdence score which is close to . for a perfect match,and close to zero for a false match with a large number of outlier correspondences. This isused to reject cases which have a high correspondence conﬁdence but also a high matchingerror by deﬁning the similarity conﬁdence metric denoted by Ψ i,j , that is obtained from the product of the correspondence and transformation conﬁdence scores as given by Ψ i,j = ζ i,j · Λ i,j . (19)If the similarity conﬁdence score is larger than a threshold, occupancy grid maps o α i and o α j are considered as a loop closure candidate. For true matches where the correspondenceconﬁdence scores are greater than . , the transformation conﬁdence scores also tend tobe large. Fig. 12 presents some representative examples of occupancy grid map matchingwhere the set of corresponding and inlier features are visualized. Fig. 12-(a-b) present thecorrespondence and transformation conﬁdence scores for two false image matches. In Fig.12-(a) despite of having a large fraction of inlier features, the computed transformationconﬁdence is small indicating the poor quality of the match due to false set of inlier features. Fig. 12

The top and bottom rows present the set of corresponding and inlier features for an occupancygrid map that is matched against 3 salient occupancy grid maps. The correspondence and transformationconﬁdence scores are computed for each pair. Only (c) shows a true positive match.itle Suppressed Due to Excessive Length 19

Fig. 12-(b) presents an opposite scenario where the number of inlier features is relativelysmall, while the computed transformation conﬁdence score is large. This is mainly becausemost of the features are concentrated in patches at certain areas in the image resulting ina low alignment error. Fig. 12-(c) presents a true match where both correspondence andtransformation conﬁdence scores are large. The results illustrate that while the individualcorrespondence and transformation conﬁdence scores do not have enough discriminationpower to determine a true loop closure, the product of both metrics can be used as a reliablesimilarity conﬁdence metric to identify most similar matches.Fig. 13 presents 2D scatter plots for correspondence and transformation conﬁdencescores in an indoor ofﬁce environment and an underground mine where occupancy gridmaps in each environment are evaluated for loop closure. On both plots, contours of similar-ity threshold values Ψ th are shown. In the underground mine, a correspondence conﬁdencethreshold of ζ i,j = 0 . would select the same loop closure candidates as similarity score Ψ th = 0 . . Moreover, there are very few matches with correspondence scores ζ i,j ≥ . that are all rejected by using the similarity threshold Ψ th . However, for the indoor ofﬁceenvironment with self-similar ofﬁce cubicles, a much larger number of candidate loop clo-sures with correspondence scores ζ i,j ≥ . are rejected by using the similarity conﬁdencethreshold Ψ th . Let α i , α j and ˆ M o αi o αj be the candidate loop closure key-nodes in the pose graph, and thecomputed homography obtained from the pre-matching step. As presented in Fig. 10, weuse a geometric veriﬁcation step of LC g ( LC s ( α i , α j )) in (2), to verify the quality of theloop closure candidate using the ICP-based scan matching shown in (8) to align the corre-sponding key-scans. The performance of the ICP algorithm relies heavily on the quality ofinitialization; with a poor initial guess the algorithm is susceptible to local minima, espe-cially if the actual 3D motion between two lidar scans is large. In order to improve con-vergence of the ICP algorithm to the optimal solution in the geometric veriﬁcation step,the algorithm is seeded using the obtained 2D rotation matrix from the computed homog-raphy ˆ M o αi o αj , where a 3D transformation matrix with a Z -axis rotation and zero translationis constructed to initialize the ICP algorithm. This can be interpreted as a two-stage opti-mization process where the objective is to reﬁne the computed geometric transformationobtained from pre-matching step to ﬁnd the best estimate of the relative 3D motion betweentwo robot poses. After aligning the point clouds, the convergence criteria is evaluated using Fig. 13

2D scatter plots showing the correspondence and transformation conﬁdence scores for occupancygrid map matching in an indoor ofﬁce environment and an underground mine.0 Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi the alignment error E ( . ) from (7) before sending the loop closure constraint to the SLAMback-end for outlier rejection veriﬁcation. The SLAM back-end is implemented using the Georgia Tech Smoothing and Mapping (GT-SAM) [70] library that implements smoothing and mapping using factor graphs and Bayesnetworks as the underlying computing paradigm. Since pose graph optimization relyingon least square optimization methods is not robust against outliers, the back-end relies onan outlier loop closure detection method as illustrated in Fig. 14 to prevent the optimizationfrom producing incorrect solutions when the front-end produces spurious loop closures. Theoutlier loop closure detection is based on the Pairwise Consistent Measurement Set Maxi-mization (PCM) method presented in [48, 71]. Once the SLAM back-end receives putativeloop closures from the geometric veriﬁcation step, the quality of loop closures is evaluatedboth in terms of consistency with odometry edges, as well as pairwise consistency withprevious loop closures.As illustrated in Fig. 14, assuming a negligible measurement noise, when a loop is closedand an edge is added between two key-nodes in the pose graph, accumulation of the loopclosure edge with all odometric or loop closure edges along the cycle will compose to theidentity [73]. Using this basic observation, back-end rejects loop closures detections thatlead to a large transformation error along the cycles in the pose graph. Once a loop closure is conﬁrmed and a new constraint is added between two key-nodes, the pose graph is optimizedusing iterative nonlinear optimization methods, (e.g. levenberg-marquardt) to obtain the bestestimate of robot poses from the set of all constraints. Algorithm summarizes the multi-stage loop closure detection process. Assuming the initial positions of robots are known, and there is at least a partial overlapbetween robot trajectories, map merging is performed on the base station by ﬁnding the

Fig. 14

Outlier rejection is performed by evaluating the quality of a loop closures in terms of consistencywith odometry edges as well as pairwise consistency with previously closed loops in the pose graph.itle Suppressed Due to Excessive Length 21

Algorithm 1

Saliency-Based Geometric Loop Closing (SGLC)

Input : a query lidar scan z i Input : a set of previously obtained lidar scans Z Output: a loop closure detection procedure F IND L OOP C LOSURE ( Z , z i )2: if ( log ( κ i ) ≤ κ th ) then (cid:46) Eq. 123: o i ← C ONSTRUCT O CCUPANCY G RID M AP ( z i ) for ( ∀ z j ∈ Z ) do if ( log ( κ j ) ≤ κ th ) then o j ← C ONSTRUCT O CCUPANCY G RID M AP ( z j ) if ( P REMATCH ( o i , o j )) then (cid:46) Pre-matching step8: return M ij if (G EOMETRIC V ERIFICATION ( z i , z j , M ij )) then (cid:46) Geometric Veriﬁcation step10: return u ij if (N OT A N O UTLIER ( u ij )) then (cid:46) Outlier rejection step12: return U ← U ∪ u ij (cid:46) Add new loop closure to the set of constraints13: end if end if end if end if end for end if end procedure correspondences between the pose graphs. This is achieved using the same multi-stage loopclosure detection process used in the single-robot scenario, where during the pre-matchingstep the search is constrained to the most observable areas in the local pose graphs to iden-tify putative loop closures. Upon determination of inter-robot loop closure candidates, eachcandidate undergoes geometric veriﬁcation and outlier rejection steps before performing aglobal pose graph optimization to ﬁnd the most probable conﬁguration of the nodes in themerged pose graphs given the set of all intra- and inter-robot constraints.

In this section, we present performance and computational analysis of our proposed saliency-based geometric loop closing (SGLC) method in a variety of complex underground environ-ments listed in Table 1. Moreover, we provide quantitative and qualitative comparison of thelocalization and mapping results against the basic geometric loop closing (BGLC) method,as well as the LeGO-LOAM that is a state-of-the-art SLAM method with loop closure de-tection capability that is recently introduced by Shan et al. [28].

Table 1

List of the explored underground mines in our ﬁeld experiments.Name of the mine AutonomouslyTraversed Distance Type of mine LocationArch Pocahontas Mine 1100 m Coal Mine Beckley, WVBeckley Exhibition Mine 1000 m Coal Mine Beckley, WVBruceton Safety Research Mine 1400 m Coal Mine Pittsburgh, PABruceton Experimental Mine 700 m Coal Mine Pittsburgh, PAHighland Mine 1400 m Coal Mine Logan, WVEagle Mine 500 m Gold Mine Julian, CA2 Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi

As presented in Fig. 15, we use a centralized multi-robot SLAM architecture, developedin the context of DARPA Subterranean Challenge, where the main objective of this roboticchallenge is to explore and map unknown and extreme underground environments as shownin Fig. 16, using autonomous single- or multi-robot systems. When a robot exploring theunknown environment is within the wireless communication range, it communicates its posegraph and the constructed map to a base station. The base station is an Intel Hades CanyonNUC8i7HVKVA ( × . GHz, GB RAM), that is responsible for receiving and mergingthe local pose graphs of individual robots to build a consistent global map of the unknownenvironment. In the context of this Challenge, the base station does not communicate theconstructed global maps back to the robots and thus, a real-time performance is not strictlyrequired. However, the constructed global maps can be used for reliable navigation of robotsin the future. Hence, the proposed architecture can be scaled to larger robot teams.Each robot is a Husky-A200 series that relies on a VLP-16 Puck Lite lidar scanner [74]and an Intel NUC 7i7DNBE ( × . GHz, 32 GB RAM) processor for onboard simulta-neous localization and mapping. By relying on an onboard Intel RealSense D435 RGB-Dcamera, each robot uses vision-based object detection based on the You Only Look Once(YOLO) [68] algorithm to detect, classify and localize objects of interest in the environment.As obtaining the ground truth data in large-scale underground environments is a chal-lenging task, similar to our method in LAMP [48], we obtain a proxy for the ground truthtrajectories by enforcing the ground truth locations of the known objects and ﬁducial mark-

Fig. 15

Centralized multi-robot collaborative mapping architecture.

Fig. 16

Examples of the sensor-degraded underground environments explored by autonomous ground robots(Husky A200). From left to right, Eagle Mine in Julian, CA, Arch Pocahontas Mine in Beckley, WV andEdgar Mine in Idaho Springs, CO.itle Suppressed Due to Excessive Length 23 ers (as shown in Fig. 16) in the best pose graph of each robot, and use the resulting optimizedtrajectory as ground truth.4.1 Performance analysis of pre-matching stepWe evaluate the sensitivity and speciﬁcity of the similarity conﬁdence metric used in thepre-matching step in (19) in ﬁve underground environments by using ROC analysis. In eachenvironment, a set of salient occupancy grid maps are obtained from the key-scans,where of the occupancy grid maps in each environment correspond to previously visitedlocations and represent true loop closures. Fig. 17 shows that loop closures can be reliablydetected in various environments by relying on occupancy grid map matching where anaverage AUC = 0 . is achieved in correctly identifying loop closures among all environ-ments. Different threshold values are marked on each plot to show low sensitivity of loopclosures to threshold values across ﬁve different underground environments.While the indoor ofﬁce represents a structured environment with salient geometric fea-tures, accurate place recognition used in the pre-matching step in this environment is shownto be more challenging as compared to the underground mines. This is mainly due to per-ceptual aliasing and data association ambiguity that arises from the self-similarity of spatialconﬁguration of local scenes in the indoor ofﬁce environment. The presence of multiple ofﬁce cubicles with identical geometric structures leads to larger number of false positivedetections in the pre-matching step. Underground tunnels have fewer repetitive geomet-ric structures which contributes to the reduction of perceptual aliasing in the pre-matchingstep. This highlights the importance of using a geometric veriﬁcation step, and a back-endequipped with an outlier rejection capability to verify each loop closure candidate in termsof accuracy and consistency before adding it to the pose graph.4.2 Computation analysis of pre-matching stepIn this experiment, we evaluate the execution time of the pre-matching and geometric veri-ﬁcation steps in order to characterize the computational complexity introduced by the pre-matching step. In each experiment, a set of occupancy grid maps are used as query map Fig. 17

The ROC curves and AUC for pre-matching step in ﬁve underground environments created by plot-ting the true positive rate (TPR) against the false positive rate (FPR) at various similarity conﬁdence Ψ threshold settings.4 Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi images. Using the pre-matching process described in Section 3.3.1, each query map image isregistered to all map images in the corresponding dataset to obtain the similarity conﬁdencescores that describe the level of visual similarity between the query image and all map im-ages in the dataset. Fig. 18 shows the execution time for loop closure detection for datasetsof different sizes. The box plots show the average execution time does not grow linearly asthe size of the datasets increase. This is due to the fact that map images with a small numberof corresponding features (less than in our experiments) are discarded from loop closureconsideration and their similarity conﬁdence score is set to zero before further processing.As presented in Section 3.3, considering the signiﬁcant number of points in each lidarscan, if the search space in basic geometric loop closing method is expanded to include allnodes in the graph, this results in the quadratic computational complexity O ( n ) [58] of theICP algorithm, incurring the BGLC method a prohibitive computational complexity cost of O ( n ) in detection of loop closures for n nodes in the pose graph. Fig. 19-(a-b) show mapsobtained in an indoor ofﬁce environment by using the BGLC method, where two searchradii of m and m are used for detection of loop closures. As illustrated with blue linesbetween loop closure candidates, increasing the search radius results in a dramatic increase( loop closure candidates) in the number of attempted loop closures. This not onlyincreases the computational load associated with ICP-based lidar scan registration, but alsoincreases the probability of spurious loop closures as shown in the map in Fig. 19-(c).In our proposed SGLC method, the pre-matching step relies on a pair-wise occupancy grid map registration process with O ( n ) complexity [59]. Moreover, the determination ofgeometric degeneracy helps to reduce the number qualifying nodes by constraining the Fig. 18

Execution time analysis for the pre-matching step. Each box comprises the computation time valuesranging from the ﬁrst to the third quartile. The median is indicated by the dashed red horizontal bar. Thewhiskers extend to the farthest data points that are within . times the interquartile range. Fig. 19 (a) Expanding the loop closure search radius from m in (a) to m in (b) based on the BGLCmethod dramatically increases number of attempted loop closures. The blue lines show all node pairs inthe pose graph that are considered for loop closure. This in turn, increases the probability of spurious loopclosures that can result in catastrophic distortions of the map as shown in (c).itle Suppressed Due to Excessive Length 25 search to areas with full observability. In a feature-rich environment where all nodes qual-ify for the pre-matching step, the algorithm will incur a computational complexity cost of O ( n ) for the set of all n binary map images which has a lower complexity than the BGLCmethod and can be executed on a separate thread to avoid affecting the real-time perfor-mance of the front-end.4.3 Large-scale mapping using SGLC methodIn this section, we provide analysis of localization and mapping accuracy and comparisonwith the state-of-the-art in a verity of large-scale and extreme unknown environments. Inour earlier work LAMP [48], a capability was developed to allow the operator to manuallyadd loop closure constraints to the pose graph if loop closures were missed due to largedrift in the estimated robot trajectory. While this is an effective tool to reduce the drift andimprove the quality of the constructed maps, it requires a human in the loop and thus, is un-reliable in large-scale and complex underground environments with intermittent or regionalwireless connections between the operator and the robots. In the experiments presented inthis section, our proposed SGLC method removes the human from the loop and enables anautomatic detection of loop closures that would otherwise be missed due to accumulation oferrors in robot trajectories. Fig. 20 and Fig. 21 show examples of constructed maps in two underground mines.At the Beckley Coal mine a robot autonomously traverses more than km in the mine andreturns to the start location to close the loop. Fig. 20-(a) shows the constructed map in anopen-loop scenario where no loop closure detection capability is used. Fig. 20-(b-c) showthe maps obtained in two trials using the BGLC method where a ﬁxed search radius of m is used. The results show many loop closure opportunities are missed, leading to thedramatic distortion of the maps. Fig. 20-(d) shows the map obtained using our proposedSGLC method. As the method is pose-invariant, it is unaffected by accumulation of errors Fig. 20

Top-down view of the 3D Map of the Beckley coal mine, Beckley, WV. (a) mapping with no loopclosure detection capability. (b) and (c) the BGLC method maps results in distorted maps due to spurious andmissed loop closure opportunities. (d) map using the proposed SGLC method.6 Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi in robot trajectory which results in successful detection of loop closures and subsequently amore consistent representation of the environment.Fig. 21 presents a multi-robot collaborative mapping scenario where local maps ob-tained by a team of two robots deployed in the Eagle mine are merged on a base station.Both robots with known initial poses in the world coordinate system autonomously navigate m inside the tunnels of the mine. Fig. 21 presents the top (a) and side (b) views of themaps obtained on the base station in an open loop scenario, without using any loop closuredetection capability. As the error in robot trajectories accumulates over time, the maps startto drift apart unbounded in the absence of loop closure detections. Using the BGLC method,for every key-node in one graph, key-nodes in the other graph that lie inside a search ra-dius of m are considered for loop closure. As presented in Fig. 21-(c), while the methodperforms well when the drift in robot trajectories is small, it misses many loop closure oppor-tunities when the trajectories drift apart due to accumulation of noisy odometric estimates.Fig. 21-(d) shows a successful loop closure using the proposed SGLC method before per-forming pose graph optimization. Using a salient geometric feature (i.e., a T-junction in thetunnel) an inter-robot loop closure is detected and the diverged trajectories are joined again.Fig. 21-(e), shows a consistent map of the environment after pose graph optimization andminimization of the error. Fig. 22 reports examples of multi-robot mapping for a team of tworobots deployed in the Bruceton Experimental Mine and Safety Research Coal Mine duringthe Tunnel Circuit of DARPA Subterranean Challenge [49], in August of 2019. Using the BGLC method, large drifts are visible in both maps as the base station fails to detect inter-robot loop closures to merge the maps due to many missed loop closure opportunities whenthe maps start to drift apart. The maps obtained using the proposed SGLC method show amore consistent 3D representation of the mines due to more frequent loop closures that helpreduce the drift in the estimated trajectories.At the Tunnel Circuit of the DARPA Subterranean challenge, a variety of objects (e.g.,backpack, ﬁre extinguisher, drill, survivor, and a cellphone) were placed at unknown loca-tions in the tunnels of the mines. The autonomous robots were used to detect and localize

Fig. 21

Map of the Eagle Mine, Julian, CA obtained on a base station. (a) and (b) present the top-down andside views of the 3D map obtained on the base station with no loop closure detection capability. (c) mappingresult using the BGLC method. (d) mapping results using the proposed SGLC method, before performingPGO to show the detected loop closure between two nodes in the graphs . (e) global map after performingPGO using the proposed SGLC method.itle Suppressed Due to Excessive Length 27 these objects in the environment by relying on the onboard RGB-D camera and the lidar-based SLAM. We use the estimated location of the detected objects to provide a quantitativeevaluation of localization and mapping accuracy. The box plots in Fig. 22 report a quantita-tive evaluation of localization accuracy using no loop closure detection capability, the BGLCmethod, and the proposed SGLC method by comparing the estimated location of detectedobjects against the ground truth data provided by DARPA. With no loop closure detectioncapability, the robot pose uncertainty and location uncertainty of detected objects grow as theerrors in lidar odometry accumulate. By using the BGLC method the drift is reduced as com-pared to the open loop trajectory, but several missed loop closure opportunities lead to accu-mulation of errors in robot trajectories which manifests itself as large object localization er-rors. Using the proposed SGLC method, loop closures are consistently detected as the robotnavigates the unknown environment, which results in signiﬁcantly more accurate localiza-tion and mapping results. Fig. 23 and Fig. 24 provide comparison of the 3D maps obtainedusing the LeGO-LAOM, BGLC and proposed SGLC method in ﬁve perceptually-degradedand large-scale underground environments. While in LeGO-LOAM the default search radiusfor loop closures is set to m, in our experiments the search radius was increased to m forboth LeGO-LOAM and the BGLC methods to improve loop closure detections in presenceof large drift. The results show both LeGO-LOAM and BGLC methods initially performwell as long as the drift in robot trajectories is small, but missing several loop closure op-portunities in the presence of large drift. This exposes the vulnerability of pose-dependent loop closure detection methods, especially in large-scale and perceptually-degraded envi-ronments where accumulation of odometry errors can be signiﬁcant. The results show theproposed SGLC method consistently outperforms the LeGO-LOAM and BGLC methodsdue to its drift-resiliency.Fig. 25 reports comparison of mapping results based on the Absolute Pose Error (APE)metric for LeGO-LOAM, BGLC, and the proposed SGLC methods using datasets obtainedfrom four perceptually-degraded and large-scale subterranean environments. The box plotsare obtained by running each algorithm times on each dataset, and recording the local-ization accuracy of several known landmarks along the robot trajectory. The results showthe SGLC method consistently outperforms LeGO-LOAM and BGLC methods across all Fig. 22

Maps of the Bruceton Experimental and Safety Research mines constructed using the BGLC andproposed SGLC methods. The box plots report the object localization errors in the ﬁnal maps, without loopclosure detection capability, as well as the BGLC and the proposed SGLC methods.8 Kamak Ebadi, Matteo Palieri, Sally Wood, Curtis Padgett, Ali-akbar Agha-mohammadi

Fig. 23

Maps of the indoor ofﬁce, Beckley Coal mine, and Safety Research mine, using the LeGO-LOAM,BGLC, and proposed SGLC methods.

Fig. 24

Maps of the Experimental mine, and Eagle mine using the LeGO-LOAM, BGLC, and proposedSGLC methods environments with different levels of perceptual degradation, and thus, the drift in estimatedrobot trajectories is signiﬁcantly reduced as reﬂected in small APE values. The results alsoshow all three methods suffer from relatively larger drift in the Experimental and Safety Re-search mines. This is due to presence of several long, ﬂat and featureless corridors in thesemines which leads to more noisy odometric estimates. itle Suppressed Due to Excessive Length 29

Fig. 25

Quantitative comparison of mapping results based on the Absolute Pose Error (APE) using LeGO-LOAM, BGLC, and the proposed SGLC methods in perceptually-degraded and large-scale subterraneanenvironments. The box plots are obtained by running each algorithm times on the dataset collected in eachenvironment and measuring the absolute pose error. While dramatic progress has been made over the past few decades in the ﬁeld of single- andmulti-robot SLAM, localization and mapping in unknown, large-scale, and perceptually-degraded environments presents a variety of challenges due to sparsity of distinctive land-marks, self-similarity of features, and poor illumination. In these extreme settings, the front-end is exposed to major challenges including data association ambiguity that can result insigniﬁcant drift in the estimated robot trajectory during long-term or large-scale navigation.This highlights the importance of drift-resilient and robust loop closure detection methodsthat can reduce the drift, and also enable merging individual maps obtained by a team ofrobots into a globally consistent map of the environment in a multi-robot SLAM system.In this paper, we developed a set of metrics and methods to improve detection of intra-and inter-robot loop closures to increase robustness and reliability of autonomous naviga-tion in unknown and extreme subterranean environments. We developed a degeneracy-awareSLAM front-end that is capable of determining the level of geometric degeneracy througheigenanalysis of the solution of the ICP algorithm. This was validated by comparing theresults obtained from lidar odometry against the wheel-inertial odometry in cases where thewheel odometry was known to be reliable. While most state-of-the-art methods attempt loopclosures periodically without considering the level of geometric degeneracy of the scene, it iscrucial to avoid closing loops in ambiguous areas with high level of geometric degeneracy asit could result in catastrophic distortions of the map. Using our proposed degeneracy-awarefront-end, areas with high level of geometric degeneracy that can lead to data associationambiguity and spurious loop closures are removed from loop closure consideration. In con-trast to traditional loop closure detection methods that perform the search for loop closuresonly locally, this capability allows the search for loop closures to be expanded as neededto account for drift, without increasing the probability of spurious loop closures. Throughanalysis of the geometric degeneracy metric κ , in a verity of indoor and underground envi-ronments, it was shown the metric can be reliably used to determine geometric degeneracyin unknown environments. Moreover, a drift-resilient geometric loop closing method based on saliency maps wasproposed to enable a multi-stage place recognition pipeline through which loop closure can-didates can be identiﬁed over the entire robot trajectory independent of estimated robotpose. This increases the robustness and accuracy of loop closure detections, especially inlarge-scale and perceptually-degraded environments. Putative loop closures were identi-ﬁed by evaluating the similarity of the spatial conﬁguration of the scenes using a feature-based image registration process, before being validated using geometric and outlier rejec-tion steps. Through extensive real-world experiments and comparisons with state-of-the-artmethods, it was shown the proposed method improves localization and mapping accuracyin perceptually-degraded subterranean environments where commonly used loop closuredetection methods exhibit insufﬁcient robustness and accuracy.Future work will focus on developing a robust multi-sensor lidar-centric front-end us-ing the geometric degeneracy detection capability. While in this work the determination ofgeometric degeneracy was only used to constrain the search for loop closures to the mostobservable areas in an unknown environment, the same capability can be used to enablea robust multi-sensor degeneracy-aware odometry system where the front-end can evaluatethe reliability of odometric estimates in real-time and switch to alternative odometry sourceswhere the unknown environment is determined to be geometrically degenerate. Another di-rection for future research is extending this work to a distributed multi-robot SLAM archi-tecture. While the centralized multi-robot architecture presented in this paper works well for a small team of robots where a base station is in charge of map alignment and merging, inlarge-scale underground environments with sporadic communication with the base station,a distributed collaborative architecture is desirable to enable a team of robots to exchangetheir maps in order to maintain a consistent global map of the explored environment.