[PDF] MULLS: Versatile LiDAR SLAM via Multi-metric Linear Least Square

Abstract

The rapid development of autonomous driving and mobile mapping calls for off-the-shelf LiDAR SLAM solutions that are adaptive to LiDARs of different specifications on various complex scenarios. To this end, we propose MULLS, an efficient, low-drift, and versatile 3D LiDAR SLAM system. For the front-end, roughly classified feature points (ground, facade, pillar, beam, etc.) are extracted from each frame using dual-threshold ground filtering and principal components analysis. Then the registration between the current frame and the local submap is accomplished efficiently by the proposed multi-metric linear least square iterative closest point algorithm. Point-to-point (plane, line) error metrics within each point class are jointly optimized with a linear approximation to estimate the ego-motion. Static feature points of the registered frame are appended into the local map to keep it updated. For the back-end, hierarchical pose graph optimization is conducted among regularly stored history submaps to reduce the drift resulting from dead reckoning. Extensive experiments are carried out on three datasets with more than 100,000 frames collected by seven types of LiDAR on various outdoor and indoor scenarios. On the KITTI benchmark, MULLS ranks among the top LiDAR-only SLAM systems with real-time performance.

Full PDF

MMULLS: Versatile LiDAR SLAM via Multi-metric Linear Least Square

Yue Pan , Pengchuan Xiao , Yujie He , Zhenlei Shao and Zesong Li Abstract — The rapid development of autonomous driving andmobile mapping calls for off-the-shelf LiDAR SLAM solutionsthat are adaptive to LiDARs of different speciﬁcations onvarious complex scenarios. To this end, we propose MULLS, anefﬁcient, low-drift, and versatile 3D LiDAR SLAM system. Forthe front-end, roughly classiﬁed feature points (ground, facade,pillar, beam, etc.) are extracted from each frame using dual-threshold ground ﬁltering and principal components analysis.Then the registration between the current frame and the localsubmap is accomplished efﬁciently by the proposed multi-metric linear least square iterative closest point algorithm.Point-to-point (plane, line) error metrics within each point classare jointly optimized with a linear approximation to estimatethe ego-motion. Static feature points of the registered frame areappended into the local map to keep it updated. For the back-end, hierarchical pose graph optimization is conducted amongregularly stored history submaps to reduce the drift resultingfrom dead reckoning. Extensive experiments are carried out onthree datasets with more than 100,000 frames collected by sixtypes of LiDAR on various outdoor and indoor scenarios. Onthe KITTI benchmark, MULLS ranks among the top LiDAR-only SLAM systems with real-time performance.

I. I

NTRODUCTION

Simultaneous localization and mapping (SLAM) plays akey role in robotics tasks, including robot navigation [1],ﬁeld surveying [2] and high-deﬁnition map production [3]for autonomous driving. Compared with vision-based [4]–[6]and RGB-D-based [7]–[9] SLAM systems, LiDAR SLAMsystems are more reliable under various lighting conditionsand are more suitable for tasks demanding for dense 3D map.The past decade has witnessed enormous development onLiDAR odometry (LO) [10]–[25] and LiDAR-based loopclosure detection [26]–[32]. Though state-of-the-art LiDAR-based approaches performs well in structured urban or indoorscenes with the localization accuracy better than most of thevision-based methods, the lack of versatility hinders themfrom being widely used in the industry. Most of methodsare based on sensor dependent point cloud representationssuch as, ring [10], [20] and range image [14]–[18], whichrequires detailed LiDAR model parameters such as, scan-linedistribution and the ﬁeld of view. It’s non-trivial to directlyadapt them on recently developed mechanical LiDARs withunique conﬁguration of beams and solid-state LiDARs withlimited ﬁeld of view [21]. Besides, some SLAM systemstarget on speciﬁc application environments such as, urban Y.Pan is with Department of Civil, Environmental and Geomatic Engi-neering, ETH Zurich, Switzerland [email protected] P.Xiao, Z.Shao and Z.Li are with Hesai Technology Co., Ltd.,Shanghai, China { xiaopengchuan intern, shaozhenlei,lizesong } @hesaitech.com Y.He is with the School of Engineering, EPFL, Lausanne, Switzerland [email protected]

Fig. 1. Overview of the proposed MULLS-SLAM system: (a) the pointcloud from a single scan of a multi-beam spinning LiDAR, (b) various typesof feature points (ground, facade, pillar, beam) extracted from the scan, (c)registered point cloud of the local map, (d) feature points of the local map,(e) global registration between revisited submaps by TEASER [36], (f) loopclosure edges among submaps, (g) generated consistent map and trajectory. road [15], [33], highway [34], coal mine [35], and forest[22] using the ad-hoc framework. They tend to get stuck indegenerated corner-cases or a rapid switch of the scene.In this work, we proposed a LiDAR-only self-containedSLAM system (as shown in Fig. 1) to cope with theaforementioned challenges. The continuous inputs of thesystem are sets of 3D coordinates (optionally with point-wise intensity and timestamp) without the conversion torings or range images. Therefore the system is independentof the LiDAR’s speciﬁcation and without the loss of 3Dstructure information. Geometric feature points with plentifulcategories such as ground, facade, pillar, beam, and roof areextracted from each frame, giving rise to better adaptabilityto the surroundings. The ego-motion is estimated efﬁcientlyby the proposed multi-metric linear least square (MULLS)iterative closest points (ICP) algorithm based on the classi-ﬁed feature points. Furthermore, a submap-based pose graphoptimization is employed as the back-end with loop closureconstraints constructed via map-to-map global registration.The main contributions of this work are listed as follows: • A scan-line independent LiDAR-only SLAM solutionnamed MULLS , with low drift and real-time perfor-mance on various scenarios. Currently, MULLS rankstop 10 on the competitive KITTI benchmark . https://github.com/YuePanEdward/MULLS a r X i v : . [ c s . R O ] F e b An efﬁcient point cloud local registration algorithmnamed MULLS-ICP that realizes the linear least squareoptimization of point-to-point (plane, line) error metricsjointly in roughly classiﬁed geometric feature points.II. R

ELATED WORKS

As the vital operation of LiDAR SLAM, point cloud reg-istration can be categorized into local and global registration.Local registration requires good transformation initial guessto ﬁnely align two overlapping scans without stuck in thelocal minima while global registration can coarsely alignthem regardless of their relative pose.Local registration based on ICP [37] has been widely usedin LO to estimate the relatively small scan-to-scan(map)transformation. Classic ICP keeps alternately solving fordense point-to-point closest correspondences and rigid trans-formation. To guarantee LO’s real-time performance (mostlyat 10Hz) on real-life regularized scenarios, faster and morerobust variants of ICP have been proposed in recent yearsfocusing on the sampled points for matching and the errormetrics [38]–[40] for optimization. As a pioneering workof sparse point-based LO, LOAM [10] selects the edge andplanar points by sorting the curvature on each scan-line. Thesquared sum of point-to-line(plane) distance is minimized bynon-linear optimization to estimate the transformation. As afollow-up, LeGO-LOAM [14] conducts ground segmentationand isotropic edge points extraction from the projected rangeimage. Then a two-step non-linear optimization is executedon ground and edge correspondences to solve two setsof transformation coefﬁcients successively. Unlike LOAM,SuMa [15] is based on the dense projective normal ICPbetween the range image of the current scan and the surfelmap. SuMa++ [16] further improved SuMa by realizingdynamic ﬁltering and multi-class projective matching withthe assistance of semantic masks. These methods share thefollowing limits. First, they require the LiDAR’s model andlose 3D information due to the operation based on rangeimage or scan-line. [11] [25] operate based on raw pointcloud but lose efﬁciency. Second, ICP with different errormetrics is solved by less efﬁcient non-linear optimization.Though efﬁcient linear least square optimization for point-to-plane metric has been applied on LO [41]–[43], the jointlinear solution to point-to-point (plane, line) metrics has notbeen proposed. These limits are solved in our work.Global registration is often required for the constraintconstruction in LiDAR SLAM’s back-end, especially whenthe odometry drifts too much to close the loop by ICP. Onesolution is to apply correlation operation on global features[28]–[32] encoded from a pair of scans to get the yaw angleestimation. Other solutions solve the 6DOF transformationbased on local feature matching and veriﬁcation among thekeypoints [44] or segments [27]. Our system follows thesesolutions by adopting the recently proposed TEASER [36]algorithm. By applying truncated least square estimationwith semi-deﬁnite relaxation [45], TEASER is more efﬁcientand robust than methods based on RANSAC [44], [46] andbranch & bound (BnB) [47], [48].

Fig. 2. Pipeline of geometric feature points extraction and encoding

III. M

ETHODOLOGY

A. Motion compensation

Given the point-wise timestamp of a frame, the time ratiofor a point p i with timestamp τ i is s i = τ e − τ i τ e − τ b , where τ b , τ e are the timestamp at the start and the end of the frame.When IMU is not available, the transformation of p i to p ( e ) i can be computed under the assumption of uniform motion: p ( e ) i = R e,i p i + t e,i ≈ slerp (cid:0) R e,b , s (cid:1) p i + s t e,b , (1) where slerp represents the spherical linear interpolation, R e,b and t e,b stands for the rotation and translation estimationthroughout the frame. The undistorted point cloud is achievedby packing all the points { p ( e ) i } at the end of the frame. B. Geometric feature points extraction and encoding

The workﬂow of this section is summarized in Fig. 2. Theinput is the raw point cloud of each frame, and the outputs aresix types of feature points with primary and normal vectors. Dual-threshold ground ﬁltering : The point cloud afteroptional preprocessing is projected to a reference plane,which is horizontal or ﬁtted from last frame’s ground points.The reference plane is then divided into equal-size 2D grids.The minimum height of the points in each grid g i and inits × neighboring grids, denoted as h ( i ) min and h ( i ) neimin , arerecorded respectively. With two thresholds δh , δh , eachpoint p k in grid g i is classiﬁed as a roughly determinedground point G rough or a nonground point N G by: p ( i ) k ∈ (cid:40) N G , if h k − h ( i ) min > δh or h ( i ) min − h ( i ) neimin > δh G rough , otherwise . (2) To reﬁne the ground points, RANSAC is employed in eachgrid to ﬁt the grid-wise ground plane. Inliers are kept asground points G and their normal vectors n are deﬁned asthe surface normal of the grid-wise best-ﬁt planes. Nonground points classiﬁcation based on PCA : Non-ground points are downsampled to a ﬁxed number andthen fed into the principal components analysis (PCA)module in parallel. The point-wise K-R neighborhood N is deﬁned as the nearest K points within a sphere ofradius R . The covariance matrix C for N is calculated as C = |N | (cid:80) i ∈N ( p i − ¯p ) ( p i − ¯p ) T , where ¯ p is the centerof gravity for N . Next, eigenvalues λ > λ > λ and theorresponding eigenvectors v , m , n are determined by theeigenvalue decomposition of C , where v , n are the primaryand the normal vector of N . Then the local linearity σ ,planarity σ , and curvature σ c [49] are deﬁned as σ = σ − σ σ , σ = σ − σ σ , σ c = σ σ + σ + σ .According to the size of local feature σ , σ , σ c and thedirection of v , n , ﬁve categories of feature points can beextracted, namely facade F , roof R , pillar P , beam B , andvertex V . To reﬁne the feature points, non-maximum suppres-sion (NMS) based on σ , σ , σ c are applied on linear points( P , B ), planar points ( F , R ) and vertices V respectively,followed by an isotropic downsampling. Together with G , theroughly classiﬁed feature points are packed for registration. Neighborhood category context encoding : Based on theextracted feature points, the neighborhood category context(NCC) is proposed to describe each vertex keypoint withalmost no other computations roughly. As shown in (3),the proportion of feature points with different categories inthe neighborhood, the normalized intensity and height aboveground are encoded. NCC is later used as the local feature forglobal registration in the SLAM system’s back-end (III-E): f (ncc) i = (cid:104) |F N ||N| |P N ||N| |B N ||N| |R N ||N| ¯ I N I max h G h max (cid:105) T i . (3) C. Multi-metric linear least square ICP

This section’s pipeline is summarized in Fig. 3. The inputsare the source and the target point cloud composed of multi-class feature points extracted in III-B, as well as the initialguess of the transformation from the source point cloud tothe target point cloud T guess t,s . After the iterative procedure ofICP [37], the outputs are the ﬁnal transformation estimation T t,s and its accuracy evaluation indexes. Multi-class closest point association : For each iteration,the closest point correspondences are determined by nearestneighbor (NN) searching within each feature point category( G , F , R , P , B , V ) under an ever-decreasing distancethreshold. Direction consistency check of the normal vector n and the primary vector v are further applied on the planarpoints ( G , F , R ) and the linear points ( P , B ) respectively.Constraints on category, distance and direction make thepoint association more robust and efﬁcient. Multi-metric transformation estimation : Suppose q i , p i are corresponding points in the source point cloud and thetarget point cloud, the residual r i after certain rotation R and translation t of the source point cloud is deﬁned as: r i = q i − p (cid:48) i = q i − ( Rp i + t ) . (4) With the multi-class correspondences, we expect to esti-mate the optimal transformation { R ∗ , t ∗ } that jointly min-imized the point-to-point, point-to-plane and point-to-linedistance between the vertex V , planar ( G , F , R ) and linear( P , B ) point correspondences. As shown in Fig. 4, thepoint-to-point (plane, line) distance can be calculated as d po → po i = (cid:107) r i (cid:107) , d po → pl j = r j · n j , and d po → li k = (cid:107) r k × v k (cid:107) respectively from from the residual, normal and primaryvector ( r , n , v ). Thus, the transformation estimation is Fig. 3. Pipeline of the multi-metric linear least square ICP(a) point to point (b) point to plane (c) point to lineFig. 4. Overview of three different distance error metrics formulated as a weighted least square optimization problemwith the following target function: { R ∗ , t ∗ } = argmin { R , t } (cid:88) i w i (cid:0) d po → po i (cid:1) + (cid:88) j w j (cid:16) d po → pl j (cid:17) + (cid:88) k w k (cid:16) d po → li k (cid:17) , (5) where w is the weight for each correspondence. Under thetiny angle assumption, R can be linearized as: R ≈  − γ βγ − α − β α  =  αβγ  × + I , (6) where α , β and γ are roll, pitch and yaw, respectivelyunder x − y (cid:48) − z (cid:48)(cid:48) Euler angle convention. Regarding theunknown parameter vector as ξ = [ t x t y t z α β γ ] T ,(5) becomes a weighted linear least square problem. Itis solved by Gauss-Markov parameter estimation with thefunctional model e = A ξ − b and the stochastic model D { e } = σ P − , where σ is a prior standard deviation, and P = diag ( w , · · · , w n ) is the weight matrix. Deduced from(5), the overall design matrix A ∈ R n × and observationvector b ∈ R n × is formulated as: A = (cid:104) A po → po T i · · · A po → pl T j · · · A po → li T k · · · (cid:105) T b = (cid:104) b po → po T i · · · b po → pl T j · · · b po → li T k · · · (cid:105) T , (7) where the components of A and b for each correspondenceunder point-to-point (plane, line) error metric are deﬁned as: A po → po i = (cid:104) [ p i ] × I (cid:105) b po → po i = q i − p i A po → pl j = (cid:104) ( p j × n j ) T n T j (cid:105) b po → pl j = n T j ( q j − p j ) A po → li k = (cid:104) [ v k ] × (cid:16) v T k p k (cid:17) I − p k v T k (cid:105) b po → li k = v k × ( q k − p k ) . (8) herefore, the estimation of unknown vector ˆ ξ is: ˆ ξ = argmin ξ ( A ξ − b ) T P ( A ξ − b ) = (cid:16) A T PA (cid:17) − (cid:16) A T Pb (cid:17) = (cid:32) n (cid:88) i =1 A T i w i A i (cid:33) − n (cid:88) i =1 A T i w i b i , (9) where both A T PA and A T Pb can be accumulated efﬁ-ciently from each correspondence in parallel. Finally, thetransformation matrix ˆ T { ˆ R , ˆ t } is reconstructed from ˆ ξ . Multi-strategy weighting function : Throughout the itera-tive procedure, a multi-strategy weighting function based onresiduals, balanced direction contribution and intensity con-sistency is proposed. The weight w i for each correspondenceis deﬁned as w i = w ( residual ) i × w ( balanced ) i × w ( intensity ) i , whosemultipliers are explained as follows.First, to better deal with the outliers, we present a residualweighting function deduced from the general robust kernelfunction covering a family of M-estimators [50]: w ( residual ) i =  , if κ = 22 (cid:15) i (cid:15) i + 2 , if κ = 0 (cid:15) i (cid:18) (cid:15) i | κ − | + 1 (cid:19) κ − , otherwise , (10) where κ is the coefﬁcient for the kernel’s shape, (cid:15) i = d i /δ is the normalized residual and δ is the inlier noise threshold.Although an adaptive searching for the best κ is feasible[51], it would be time-consuming. Therefore, we ﬁx κ = 1 in practice, thus leading to a pseudo-Huber kernel function.Second, the contribution of the correspondences is notalways balanced on x, y, z direction. The following weightingfunction considering the number of correspondences fromeach category is proposed to guarantee the observability. w ( balanced ) i =  |F| + 2 |P| + |B| |G| + |R| ) , i ∈ G or i ∈ R , otherwise . (11) Third, since the intensity channel provides extra informa-tion for registration, (12) is devised to penalize the contribu-tion of correspondences with large intensity inconsistency. w ( intensity ) i = e − | ∆I i | I max . (12) Multi-index registration quality evaluation : After theconvergence of ICP, the posterior standard deviation ˆ σ andinformation matrix I of the registration are calculated as: ˆ σ = 1 n − (cid:16) Aˆ ξ − b (cid:17) T P (cid:16) Aˆ ξ − b (cid:17) , I = 1ˆ σ ( A T PA ) . (13) where ˆ σ , I together with the nonground point overlappingratio O ts , are used for evaluating the registration quality. D. MULLS front-end

As shown in Fig. 5(a), the front-end of MULLS is acombination of III-B, III-C and a map management module.With an incoming scan, roughly classiﬁed feature points areextracted and further downsampled into a sparser group ofpoints for efﬁciency. Besides, a local map containing staticfeature points from historical frames is maintained with the

Fig. 5. Overall workﬂow of MULLS-SLAM: (a) front-end, (b) back-end. (c) Step2:inner-submap PGO(b) Step1:inter-submap PGO(a) Before optimizationSubmap Static frameFrames without optimization Submap reference frame Loop closure edgeAdjacent edge Optimized frame

Fig. 6. Inter-submap and inner-submap pose graph optimization reference pose of the last frame. Taking the last frame’sego-motion as an initial guess, the scan-to-scan MULLS-ICP is conducted between the sparser feature points of thecurrent frame and the denser feature points of the last framewith only a few iterations. The estimated transformation isprovided as a better initial guess for the following scan-to-map MULLS-ICP. It regards the feature points in thelocal map as the target point cloud and keeps iterating untilthe convergence. The sparser group of feature points areappended to the local map after the map-based dynamicobject ﬁltering. For this step, within a distance threshold tothe scanner, those nonground feature points far away fromtheir nearest neighbors with the same category in the localmap are ﬁltered. The local map is then cropped with a ﬁxedradius. Only the denser group of feature points of the currentframe are kept for the next epoch.

E. MULLS back-end

As shown in Fig. 5(b), the periodically saved submap isthe processing unit. The adjacent and loop closure edgesamong the submaps are constructed and veriﬁed by thecertiﬁcated and efﬁcient TEASER [36] global registration.Its initial correspondences are determined according to thecosine similarity of NCC features encoded in III-B among thevertex keypoints. Taking TEASER’s estimation as the initialguess, the map-to-map MULLS-ICP is used to reﬁne inter-submap edges with accurate transformation and informationmatrices calculated in III-C. Those edges with higher ˆ σ orlower O ts than thresholds would be deleted. As shown inFig. 6, once a loop closure edge is added, the pose correctionof free submap nodes is achieved by the inter-submap PGO.Finally, the inner-submap PGO ﬁxes each submap’s referenceframe and adjusts the other frames’ pose. ig. 7. MULLS’s result on urban scenario (KITTI seq.00): (a) overview,(b,c) map in detail of loop closure areas, (d) trajectory comparison.Fig. 8. MULLS’s result on highway scenarios (KITTI seq.01): (a) overview,(b) map and trajectory (black: ground truth) at the end of the sequence, (c)lane-level ﬁnal drift, (d) map and trajectory of a challenging scene with highsimilarity and few features, (e) trajectory comparison (almost overlapped). IV. E

XPERIMENTS

The proposed MULLS system is evaluated qualitativelyand quantitatively on KITTI, MIMAP and HESAI datasets,covering various outdoor and indoor scenes using six dif-ferent types of LiDAR, as summarized in Table I. All theexperiments are conducted on a PC equipped with an IntelCore [email protected] CPU for fair comparison.

A. Quantitative experiment

The quantitative evaluations are done on the KITTI Odom-etry/SLAM dataset [52], which is composed of 22 sequencesof data with more than 43k scans captured by a VelodyneHDL-64E LiDAR. KITTI covers various outdoor scenariossuch as, urban road, country road, and highway. Seq. 0-10 are provided with GNSS-INS ground truth pose whileseq. 11-21 are used as the test set for online evaluation andleaderboard without provided ground truth. Following theodometry evaluation criterion in [52], the average translation& rotation error (ATE & ARE) are used for localization accu-racy evaluation. The performance of the proposed MULLS-LO (w/o loop closure) and MULLS-SLAM (w/ loop closure)as well as seven state-of-the-art LiDAR SLAM solutions (re-sults taken from their original papers and KITTI leaderboard)are reported in Table II. MULLS-LO achieves the best ATE(0.49%) on seq. 00-10 and runner-up on online test sets.With loop closure and PGO, MULLS-SLAM gets the bestARE (0.12 ◦ /100m) but worse ATE. This is because ATE andARE can’t fully reﬂect the improvement of global positioningaccuracy since they are measured locally within a distanceup to 800m. Besides, the maps and trajectories constructed TABLE I: The proposed method is evaluated on three datasetscollected by various LiDARs in various scenarios.Dataset LiDAR Scenario µ = 6 . cm) between the point cloud generated from TLS and MULLS.Fig. 10. Overview of MULLS’s results on HESAI dataset: (a) MULLSSLAM map aligned with Google Earth, (b-d) example of the generated pointcloud map of the urban expressway, industry park, and indoor scenario. from MULLS for two representative sequences are shown inFig. 7 and 8. Speciﬁcally, our method surpasses state-of-the-art methods with a large margin. It has only a lane-level driftin the end on the featureless and highly dynamic highwaysequence, as shown in Fig. 8(c). B. Qualitative experiments1)

MIMAP : The ISPRS Multi-sensor Indoor Mappingand Positioning dataset [53] is collected by a backpackmapping system with Velodyne VLP32C and HDL32E ina 5-ﬂoor building. High accuracy terrestrial laser scanning(TLS) point cloud of ﬂoor 1 & 2 scanned by Rigel VZ-1000is regarded as the ground truth for map quality evaluation.Fig. 9 demonstrates MULLS’s indoor mapping quality with a6.7cm mean mapping error, which is calculated as the meandistance between each point in the map point cloud and thecorresponding nearest neighbor in TLS point cloud. HESAI : HESAI dataset is collected by three kinds ofmechanical LiDAR (Pandar128, XT, and QT Lite). Since theground truth pose or map is not provided, the qualitativeresult of the consistent maps built by MULLS-SLAM indi-cate MULLS’s ability for adapting to LiDARs with differentnumber and distribution of beams in both the indoor andoutdoor scenes, as shown in Fig. 10.

C. Ablation studies

Further, in-depth performance evaluation on the adoptedcategories of geometric feature points and the weighting

ABLE II: Quantitative evaluation on KITTI Dataset.

Red and blue fonts denote the ﬁrst and second place, respectively.(U: urban road, H: highway, C: country road, *: with loop closure, -: not available)Error KITTI RuntimeApproach (%) 00 01 02 03 04 05 06 07 08 09 10 00-10 online per frame( ◦ /100m) U* H C* C C C* U* U* U* C* C mean mean (s)Ours ATE - 0.10MULLS-SLAM ARE - - - - -LOAM ATE 0.78 1.43 0.92 0.86 0.71 0.57 0.65 0.63 1.12 0.77 0.79 0.84 IMLS-SLAM ATE

S4-SLAM ATE 0.62 1.11 1.63 0.82 0.95 0.50 0.65 0.60 1.33 1.05 0.96 0.92 0.93 0.20[25] ARE - - - - - - - - - - - - 0.38SUMA ++ ATE 0.64 1.60 1.00 0.67 0.37 0.40 0.46 0.34 1.10

TABLE III: Ablation study w.r.t. geometric feature points(-: LiDAR odometry failed) po → pl po → li po → po KITTI 00 U KITTI 01 H

G F P B V

ATE / ARE ATE / ARE (cid:51) (cid:51) (cid:51) (cid:51) (cid:55) (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:51) (cid:55) (cid:51) (cid:55) (cid:55) (cid:51) (cid:55) (cid:51) (cid:51) (cid:55) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) w (bal.) w (res.) w (int.) ATE / ARE ATE / ARE (cid:51) (cid:51) (cid:51) (cid:55) (cid:55) (cid:55) (cid:55) (cid:51) (cid:51) (cid:51) (cid:55) (cid:51) (cid:51) (cid:51) (cid:55) functions are conducted to verify the proposed method’seffectiveness. Table III shows that vertex point-to-point cor-respondences undermine LO’s accuracy on both sequencesbecause points from regular structures are more reliable thanthose with high curvature on vegetations. Therefore, vertexpoints are only used in back-end global registration in prac-tice. Besides, linear points (pillar and beam) are necessaryfor the supreme performance on highway scene. Points onguardrails and curbs are extracted as linear points to imposecross direction constraints. As shown in Table IV, all thethree functions presented in III-C are proven effective onboth sequences. Speciﬁcally, translation estimation realizesobvious improvement on featureless highway scenes with theconsideration of intensity consistency.

D. Runtime analysis

Besides, detailed analysis on the average runtime and iter-ation number of the registration with two different parametersettings is performed. As shown in Table V, T reg is the totalruntime for registration while t corr , t lls , t lma and t lmad arethe average runtime per iteration for correspondence deter-mination, linear least square estimation (used by MULLS),Levenberg-Marquardt (LM) analytical optimization (used byLOAM [10]) and LM with auto-derivation (used by A- TABLE V: Runtime analysis for the registration module(the ﬁrst two columns indicate the average feature point number)Source Target T reg t corr t lls t lma t lmad

897 6421 11.8 16.35 1.34

LOAM ). It is shown that linear least square is more thanﬁve times faster than non-linear optimization methods suchas LM. Moreover, Fig. 11 shows MULLS’s detailed timingreport of four typical sequences from KITTI and HESAIdataset with different number of incoming points per frame.MULLS operates faster than 10Hz on average for all of thefour sequences, though it sometimes runs beyond 100ms onsequences with lots of loops such as Fig. 11(a). Since theloop closure can be implemented on another thread, the real-time property of MULLS on moderate PC can be guaranteed.V. C ONCLUSIONS

In this work, we present a versatile LiDAR-only SLAMsystem named MULLS, which is based on the efﬁcient multi-metric linear least square ICP. Experiments on more than 100thousand frames collected by six different LiDARs show thatMULLS manages to achieve real-time performance with lowdrift and high map quality on various challenging outdoorand indoor scenarios regardless of the LiDAR speciﬁcations. https://github.com/HKUST-Aerial-Robotics/A-LOAM EFERENCES[1] H. Temeltas and D. Kayak, “Slam for robot navigation,”

IEEEAerospace and Electronic Systems Magazine , vol. 23, no. 12, pp. 16–19, 2008.[2] K. Ebadi, Y. Chang, M. Palieri, A. Stephens, A. Hatteland, E. Hei-den, A. Thakur, N. Funabiki, B. Morrell, S. Wood, et al. , “Lamp:Large-scale autonomous mapping and positioning for explorationof perceptually-degraded subterranean environments,” in . IEEE,2020, pp. 80–86.[3] W.-C. Ma, I. Tartavull, I. A. Bˆarsan, S. Wang, M. Bai, G. Mattyus,N. Homayounfar, S. K. Lakshmikanth, A. Pokrovsky, and R. Urtasun,“Exploiting sparse semantic hd maps for self-driving vehicle local-ization,” in . IEEE, 2019, pp. 5304–5311.[4] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: aversatile and accurate monocular slam system,”

IEEE transactions onrobotics , vol. 31, no. 5, pp. 1147–1163, 2015.[5] C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-directmonocular visual odometry,” in . IEEE, 2014, pp. 15–22.[6] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monoc-ular visual-inertial state estimator,”

IEEE Transactions on Robotics ,vol. 34, no. 4, pp. 1004–1020, 2018.[7] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli,J. Shotton, S. Hodges, D. Freeman, A. Davison, et al. , “Kinectfusion:real-time 3d reconstruction and interaction using a moving depthcamera,” in

Proceedings of the 24th annual ACM symposium on Userinterface software and technology , 2011, pp. 559–568.[8] T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, andJ. McDonald, “Real-time large-scale dense rgb-d slam with volumetricfusion,”

The International Journal of Robotics Research , vol. 34, no.4-5, pp. 598–626, 2015.[9] M. Labb´e and F. Michaud, “Rtab-map as an open-source lidar andvisual simultaneous localization and mapping library for large-scaleand long-term online operation,”

Journal of Field Robotics , vol. 36,no. 2, pp. 416–446, 2019.[10] J. Zhang and S. Singh, “Loam: Lidar odometry and mapping in real-time.” in

Robotics: Science and Systems , vol. 2, no. 9, 2014.[11] J.-E. Deschaud, “Imls-slam: scan-to-model matching based on 3ddata,” in . IEEE, 2018, pp. 2480–2485.[12] J. Graeter, A. Wilczynski, and M. Lauer, “Limo: Lidar-monocularvisual odometry,” in . IEEE, 2018, pp. 7872–7879.[13] F. Neuhaus, T. Koß, R. Kohnen, and D. Paulus, “Mc2slam: Real-time inertial lidar odometry using two-scan motion compensation,” in

German Conference on Pattern Recognition . Springer, 2018.[14] T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain,” in . IEEE, 2018, pp. 4758–4765.[15] J. Behley and C. Stachniss, “Efﬁcient surfel-based slam using 3d laserrange data in urban environments.” in

Robotics: Science and Systems ,2018.[16] X. Chen, A. Milioto, E. Palazzolo, P. Gigu`ere, J. Behley, andC. Stachniss, “Suma++: Efﬁcient lidar-based semantic slam,” in . IEEE, 2019, pp. 4530–4537.[17] D. Kovalenko, M. Korobkin, and A. Minin, “Sensor aware lidarodometry,” in .IEEE, 2019, pp. 1–6.[18] Q. Li, S. Chen, C. Wang, X. Li, C. Wen, M. Cheng, and J. Li,“Lo-net: Deep real-time lidar odometry,” in

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , 2019, pp.8473–8482.[19] H. Ye, Y. Chen, and M. Liu, “Tightly coupled 3d lidar inertialodometry and mapping,” in . IEEE, 2019, pp. 3144–3150.[20] X. Zuo, P. Geneva, W. Lee, Y. Liu, and G. Huang, “Lic-fusion: Lidar-inertial-camera odometry,” in . IEEE, 2019, pp.5848–5854. [21] J. Lin and F. Zhang, “Loam livox: A fast, robust, high-precision lidarodometry and mapping package for lidars of small fov,” in . IEEE,2020, pp. 3126–3131.[22] S. W. Chen, G. V. Nardari, E. S. Lee, C. Qu, X. Liu, R. A. F. Romero,and V. Kumar, “Sloam: Semantic lidar odometry and mapping forforest inventory,”

IEEE Robotics and Automation Letters , vol. 5, no. 2,pp. 612–619, 2020.[23] T. Shan, B. Englot, D. Meyers, W. Wang, C. Ratti, and R. Daniela,“Lio-sam: Tightly-coupled lidar inertial odometry via smoothing andmapping,” in

IEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS) . IEEE, 2020.[24] I. Garc´ıa Daza, M. Rentero, C. Salinas Maldonado, R. Izquierdo Gon-zalo, and N. Hern´andez Parra, “Fail-aware lidar-based odometry forautonomous vehicles,”

Sensors , vol. 20, no. 15, p. 4097, 2020.[25] B. Zhou, Y. He, K. Qian, X. Ma, and X. Li, “S4-slam: A real-time3d lidar slam system for ground/watersurface multi-scene outdoorapplications,”

Autonomous Robots , pp. 1–22, 2020.[26] L. He, X. Wang, and H. Zhang, “M2dp: A novel 3d point clouddescriptor and its application in loop closure detection,” in . IEEE, 2016, pp. 231–237.[27] R. Dub´e, A. Cramariuc, D. Dugas, H. Sommer, M. Dymczyk, J. Nieto,R. Siegwart, and C. Cadena, “Segmap: Segment-based mapping andlocalization using data-driven descriptors,”

The International Journalof Robotics Research , vol. 39, no. 2-3, pp. 339–355, 2020.[28] G. Kim and A. Kim, “Scan context: Egocentric spatial descriptorfor place recognition within 3d point cloud map,” in .IEEE, 2018, pp. 4802–4809.[29] H. Wang, C. Wang, and L. Xie, “Intensity scan context: Codingintensity and geometry relations for loop closure detection,” in ,2020, pp. 2095–2101.[30] J. Jiang, J. Wang, P. Wang, P. Bao, and Z. Chen, “Lipmatch: Lidarpoint cloud plane based loop-closure,”

IEEE Robotics and AutomationLetters , vol. 5, no. 4, pp. 6861–6868, 2020.[31] F. Liang, B. Yang, Z. Dong, R. Huang, Y. Zang, and Y. Pan, “Anovel skyline context descriptor for rapid localization of terrestriallaser scans to airborne laser scanning point clouds,”

ISPRS Journal ofPhotogrammetry and Remote Sensing , vol. 165, pp. 120–132, 2020.[32] X. Chen, T. L¨abe, A. Milioto, T. R¨ohling, O. Vysotska, A. Haag,J. Behley, C. Stachniss, and F. Fraunhofer, “Overlapnet: Loop closingfor lidar-based slam,” in

Proc. Robot.: Sci. Syst. , 2020.[33] S. Li, G. Li, L. Wang, and Y. Qin, “Slam integrated mobile mappingsystem in complex urban environments,”

ISPRS Journal of Photogram-metry and Remote Sensing , vol. 166, pp. 316–332, 2020.[34] S. Zhao, Z. Fang, H. Li, and S. Scherer, “A robust laser-inertialodometry and mapping method for large-scale highway environments,”in . IEEE, 2019, pp. 1285–1292.[35] M. Li, H. Zhu, S. You, L. Wang, and C. Tang, “Efﬁcient laser-based3d slam for coal mine rescue robots,”

IEEE Access , vol. 7, pp. 14 124–14 138, 2018.[36] H. Yang, J. Shi, and L. Carlone, “Teaser: Fast and certiﬁablepoint cloud registration,” 2020. [Online]. Available: https://github.com/MIT-SPARK/TEASER-plusplus[37] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,”in

Sensor fusion IV , vol. 1611, 1992, pp. 586–606.[38] S. Rusinkiewicz and M. Levoy, “Efﬁcient variants of the icp algo-rithm,” in

Proceedings third international conference on 3-D digitalimaging and modeling . IEEE, 2001, pp. 145–152.[39] A. Segal, D. Haehnel, and S. Thrun, “Generalized-icp.” in

Robotics:science and systems , vol. 2, no. 4. Seattle, WA, 2009, p. 435.[40] A. Censi, “An icp variant using a point-to-line metric,” in . Ieee, 2008,pp. 19–25.[41] K.-L. Low, “Linear least-squares optimization for point-to-plane icpsurface registration,”

Chapel Hill, University of North Carolina , vol. 4,no. 10, pp. 1–3, 2004.[42] F. Pomerleau, F. Colas, and R. Siegwart, “A review of point cloudregistration algorithms for mobile robotics,”

Foundations and Trendsin Robotics , vol. 4, no. 1, pp. 1–104, 2015.[43] T. K¨uhner and J. K¨ummerle, “Large-scale volumetric scene reconstruc-ion using lidar,” in . IEEE, 2020, pp. 6261–6267.[44] R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms(fpfh) for 3d registration,” in . IEEE, 2009, pp. 3212–3217.[45] H. Yang, P. Antonante, V. Tzoumas, and L. Carlone, “Graduated non-convexity for robust spatial perception: From non-minimal solvers toglobal outlier rejection,”

IEEE Robotics and Automation Letters , vol. 5,no. 2, pp. 1127–1134, 2020.[46] N. Mellado, D. Aiger, and N. J. Mitra, “Super 4pcs fast globalpointcloud registration via smart indexing,” in

Computer GraphicsForum , vol. 33, no. 5. Wiley Online Library, 2014, pp. 205–215.[47] J. Yang, H. Li, D. Campbell, and Y. Jia, “Go-icp: A globally optimalsolution to 3d icp point-set registration,”

IEEE transactions on patternanalysis and machine intelligence , vol. 38, no. 11, pp. 2241–2254,2015.[48] Z. Cai, T.-J. Chin, A. P. Bustos, and K. Schindler, “Practical op-timal registration of terrestrial lidar scan pairs,”

ISPRS journal ofphotogrammetry and remote sensing , vol. 147, pp. 118–131, 2019.[49] T. Hackel, J. D. Wegner, and K. Schindler, “Fast semantic segmenta-tion of 3d point clouds with strongly varying density,”

ISPRS annals ofthe photogrammetry, remote sensing and spatial information sciences ,vol. 3, pp. 177–184, 2016.[50] J. T. Barron, “A general and adaptive robust loss function,” in

Proceedings of the IEEE Conference on Computer Vision and PatternRecognition , 2019, pp. 4331–4339.[51] N. Chebrolu, T. L¨abe, O. Vysotska, J. Behley, and C. Stachniss,“Adaptive robust kernels for non-linear least squares problems,” arXivpreprint arXiv:2004.14938 , 2020.[52] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomousdriving? the kitti vision benchmark suite,” in . IEEE, 2012.[53] C. Wang, Y. Dai, N. Elsheimy, C. Wen, G. Retscher, Z. Kang, andA. Lingua, “Isprs benchmark on multisensory indoor mapping andpositioning,”