[PDF] Intensity-SLAM: Intensity Assisted Localization and Mapping for Large Scale Environment

Abstract

Simultaneous Localization And Mapping (SLAM) is a task to estimate the robot location and to reconstruct the environment based on observation from sensors such as LIght Detection And Ranging (LiDAR) and camera. It is widely used in robotic applications such as autonomous driving and drone delivery. Traditional LiDAR-based SLAM algorithms mainly leverage the geometric features from the scene context, while the intensity information from LiDAR is ignored. Some recent deep-learning-based SLAM algorithms consider intensity features and train the pose estimation network in an end-to-end manner. However, they require significant data collection effort and their generalizability to environments other than the trained one remains unclear. In this paper we introduce intensity features to a SLAM system. And we propose a novel full SLAM framework that leverages both geometry and intensity features. The proposed SLAM involves both intensity-based front-end odometry estimation and intensity-based back-end optimization. Thorough experiments are performed including both outdoor autonomous driving and indoor warehouse robot manipulation. The results show that the proposed method outperforms existing geometric-only LiDAR SLAM methods.

Full PDF

IIEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2021 1

Intensity-SLAM: Intensity Assisted Localizationand Mapping for Large Scale Environment

Han Wang , Chen Wang , and Lihua Xie Abstract —Simultaneous Localization And Mapping (SLAM)is a task to estimate the robot location and to reconstruct theenvironment based on observation from sensors such as LIghtDetection And Ranging (LiDAR) and camera. It is widely used inrobotic applications such as autonomous driving and drone deliv-ery. Traditional LiDAR-based SLAM algorithms mainly leveragethe geometric features from the scene context, while the intensityinformation from LiDAR is ignored. Some recent deep-learning-based SLAM algorithms consider intensity features and train thepose estimation network in an end-to-end manner. However, theyrequire signiﬁcant data collection effort and their generalizabilityto environments other than the trained one remains unclear. Inthis paper we introduce intensity features to a SLAM system.And we propose a novel full SLAM framework that leveragesboth geometry and intensity features. The proposed SLAMinvolves both intensity-based front-end odometry estimation andintensity-based back-end optimization. Thorough experimentsare performed including both outdoor autonomous driving andindoor warehouse robot manipulation. The results show that theproposed method outperforms existing geometric-only LiDARSLAM methods.

Index Terms —SLAM, localization, industrial robots

I. INTRODUCTION L OCALIZATION is one of the fundamental and essentialtopics in robotics. With the development of the roboticindustry, robot localization has become more challenging inthe past decades: from known to unknown environments, fromsimple to complex environments, from static to dynamic envi-ronments [1], and from short-term to long-term localization[2]. Traditionally, ﬁxed anchors (or routers) are set up ina pre-deﬁned area and the robot pose is obtained by usingthe distances of the robot from multiple anchors, e.g. , Ultra-Wide Band (UWB) [3] and Wi-Fi localization [4]. However,these methods rely on external setup and they are mainlyused in a small scale environment. To tackle the limitationof traditional localization methods, Simultaneous LocalizationAnd Mapping (SLAM) is introduced to estimate the robot posefrom on-board sensors [5]. It is independent of an externalsetup and hence becomes promising in robotic applications.

Manuscript received: October, 14, 2020; Revised December, 10, 2020;Accepted January, 30, 2021.This paper was recommended for publication by Editor Sven Behnke uponevaluation of the Associate Editor and Reviewers’ comments. This work wassupported by Delta-NTU Corporate Laboratory for Cyber-Physical Systemsunder the National Research Foundation Corporate Lab @ University Scheme. Han Wang and Lihua Xie are with the School of Electrical and ElectronicEngineering, Nanyang Technological University, 50 Nanyang Avenue, Singa-pore 639798. e-mail: { wang.han,elhxie } @ntu.edu.sg Chen Wang is with the Robotics Institute, Carnegie Mellon University,Pittsburgh, PA 15213, USA. e-mail: [email protected]

Digital Object Identiﬁer (DOI): see top of this page.

Fig. 1: Example of localization and mapping from the pro-posed method on KITTI sequence 00. The localization resultis shown on the right corner with the ground truth plottedin red color and the trajectory plotted in green color. Ourmethod achieves much higher localization accuracy comparedto geometric-only methods.Based on the perception system used, SLAM can be furtherdivided into Visual SLAM (V-SLAM) and LiDAR SLAM.Compared to Visual SLAM, LiDAR SLAM is more accurateand robust to environmental changes such as weather andillumination [6].Conventional LiDAR SLAM methods such as LOAM [7]and HDL-Graph-SLAM [8] mainly focus on the geometric-only information to minimize point cloud differences. In thisway, the information from the intensity channel is oftenignored. Note that the intensity information is related to thereﬂectivity of material and is different for different objects,which can be useful for localization and object recognition.Therefore, we argue that a robust LiDAR SLAM systemshould take such intensity information into account. Althoughrecently there are some works aiming to use intensity in-formation to improve the accuracy, they mainly adopt theConvolutional Neural Networks (CNN) to ﬁt the raw laser scandata without any speciﬁc intensity analysis or formulation. a r X i v : . [ c s . R O ] F e b IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2021

Moreover, collection of large data is required in order totrain neural networks, which can be troublesome in practicalapplications.In the previous work [9], we have demonstrated that inten-sity information is very useful for robust loop closure detec-tion. In this work we further utilize the intensity informationto improve the localization accuracy of a SLAM system. Wepropose a novel SLAM framework that uses both geometryand intensity information for odometry estimation. We ﬁrstanalyze the physical model of intensity measurement. Thenwe introduce an extra intensity cost to the existing geometric-only cost in the odometry estimation formula. Finally, wecombine the intensity-based loop closure detection and back-end optimization to further improve the performance. Theproposed framework is tested using an indoor warehouseenvironment robot as well as an outdoor autonomous drivingcar. The results indicate that our method provides reliableand accurate localization in multiple environments and out-performs geometric-only methods. The main contributions ofthis paper are as below: • We propose a novel SLAM framework that uses bothintensity and geometry information for localization es-timation, which involves front-end odometry estimationand back-end factor graph optimization. The proposedmethod is open sourced . • We propose to build an intensity map to reveal theintensity distribution and introduce an intensity cost tothe existing geometry-only LiDAR SLAM to estimate therobot’s location. • A thorough evaluation on the proposed method is pre-sented. More speciﬁcally, our method is tested in bothwarehouse and autonomous driving scenarios. The resultsshow that our method outperforms the existing geometric-only methods for multiple scenarios.This paper is organized as follows: Section II reviews therelated work on existing LiDAR SLAM approaches. SectionIII describes the details of the proposed approach, includingintensity calibration, feature selection, odometry estimation,and Intensity Scan Context (ISC) based graph optimization.Section IV shows the experimental results and comparisonwith some existing works, followed by the conclusion inSection V. II. RELATED WORKMajority of the existing works on LiDAR SLAM focus ongeometry information of the environment. One of the mostpopular methods for point cloud matching is the Iterative Clos-est Point (ICP) method which matches the current point to thenearest point in the target frame [10]. ICP iteratively searchesfor the optimal point correspondence by minimizing the Eu-clidean distance between point pairs until the transformationmatrix converges. The algorithm is popularly used in SLAMsystems such as HDL-Graph SLAM [11] and LiDAR-OnlyOdometry and Localization (LOL) [12]. However, all pointsare used for calculation, which is computationally expensive https://github.com/wh200720041/intensity slam for a LiDAR with tens of thousands of points per scan. ICP isalso sensitive to noise. In real applications such as autonomousdriving, the measurement noise ( e.g. , measurement from treesalongside a road) can be signiﬁcant and subsequently causelocalization drifts. A more robust and computationally efﬁcientway is to leverage geometric features. In LiDAR Odome-try And Mapping (LOAM) [7], a simple feature extractionstrategy is introduced by analyzing the local smoothness. Thefeature points are segmented into edge features and planarfeatures based on local smoothness, and the robot pose issubsequently calculated by doing edge-to-edge and planar-to-planar matching respectively. Similar idea is also used inLightweight and Ground-Optimized LiDAR Odometry AndMapping (LeGO-LOAM) [13], which targets at providinga computationally efﬁcient LiDAR SLAM for UnmannedGround Vehicles (UGVs). A laser scan is ﬁrst segmentedinto ground points and non-ground points. The planar featuresare extracted from ground points while the edge features areextracted from non-ground points. The planar features are usedto estimate the roll and pitch angle as well as z-translation. Theresults are subsequently used in the matching of non-groundpoints to calculate x and y translation as well as yaw angle.LeGO-LOAM has achieved higher localization accuracy thanLOAM for grounded robots. However, the intensity channelis ignored and only the geometry channel is used in thosemethods.In recent years, some works attempt to introduce the inten-sity channel into SLAM via deep learning. In DeepICP, an end-to-end learning-based 3D point cloud registration is proposedto ﬁnd the robot pose [14]. Both intensity channel and ge-ometry channel are introduced into a Deep Feature Extraction(DFE) layer to ﬁnd the keypoints. Rather than searching forthe closest point, a Corresponding Points Generation (CPG)layer is used to generate keypoint correspondences based onlearned matching probabilities among a group of candidates,which makes back-propagation possible. Deep-ICP is validatedusing both KITTI [15] and Apollo-SouthBay dataset [16] andoutperforms existing methods such as Generalized-ICP [17]and Normal Distribution Transform (NDT) [18]. Chen et al also introduced an end-to-end CNN framework to identify theoverlap of two laser scans for back-end SLAM [19]. Thegeometry channel as well as intensity channel are trainedthrough a 11-layer network to generate a 3-dimensional fea-ture. The overlap is estimated by applying another smallerlayer to the extracted features, and the yaw angle changeis estimated by the cross correlation of the trained features.The experiment on the KITTI dataset shows that the yawangle estimated from both intensity and geometry channels isbetter than the geometric-only deep-learning-based framework.The deep-learning-based method adopts end-to-end trainingwithout further analysis of intensity information. However, itis often difﬁcult and time consuming to collect, label and traindata in practice. Moreover, the performance under environmentchanges may not be consistent, i.e. , the trained network mayperform well in the environment similar to the training data,but fails when transferring to another environment.Some other works attempt to use the intensity channel andintegrate intensity into pose estimation without end-to-end ANG et al. : INTENSITY-SLAM: INTENSITY ASSISTED LOCALIZATION AND MAPPING 3

Intensity Calibration Feature Extraction Intensity MapIntensityScan Context Pose Graph Odometry EstimationLoopDetected?Laser Input Pose Correction Global MapKeyFrame?

YesYes

Trajectory

Laser OdometryBack-end Optimization

Fig. 2: System overview of proposed Intensity-SLAM. In the front-end, we analyze both geometry and intensity features toestimate odometry. In the back-end, we use intensity scan context for loop closure detection and global pose graph optimization.training. Tian et al propose an intensity-assisted ICP for fastregistration of 2D-LiDAR [20]. Compared to the traditionalICP algorithm, a target function is introduced to determinethe initial rigid-body transformation estimation based on thespatial distance and intensity residual. The number of iterationsis reduced by 10 times so that the proposed algorithm is ableto operate in real time with a single core Central ProcessingUnit (CPU). In [21], an intensity model is incorporated intothe Sparse Bundle Adjustment (SBA) [22] estimation problem.The authors analyze the physical model of intensity measure-ment and propose a new measurement model that augments 3Dlocalization, intensity and surface normal. Simulation resultsshow that the addition of intensity measurements achievessimilar SLAM accuracy with fewer landmarks. This work isfurther extended in [23] with the experiment performed usinga Time of Flight (ToF) camera. The key-points are selectedand tracked via Scale Invariant Feature Transform (SIFT) [24]or Binary Robust Invariant Scalable Keypoints (BRISK) [25]feature extraction from an intensity imaginary. The experimentmainly focuses on Visual Odometry (VO) via computer visiontechniques but the performance of LiDAR SLAM in largescale environments is not demonstrated. Khan et al proposea data driven approach to calibrate the intensity information[26]. Then the scan matching is solved by minimizing theintensity residual of the point pairs rather than geometryresidual. It is subsequently integrated into Hector SLAM [27],which signiﬁcantly reduces the drifting error compared tothe original Hector SLAM. However, it is limited to 2Dlocalization and mapping in a small scale environment.III. METHODOLOGYThe proposed framework of Intensity-SLAM consists ofboth front-end laser odometry and back-end loop closuredetection. The system overview is shown in Fig. 2. Thefront-end includes intensity calibration, feature extraction andodometry estimation, while the back-end includes loop closuredetection and pose graph optimization.

A. Intensity Calibration

LiDAR emits laser beams and measures the arrival time aswell as the energy of reﬂected signals. The object’s positionin sensor coordinates is determined by emitting angle anddistance ( i.e. , arrival time). The intensity value is determinedby the ratio between the energy received from the reﬂectedsignals and the laser power emitted. The physical principlesof the received power P r can be determined as [21]: P r = P e D r ρ R η sys η atm cos α, (1)where P e is the power of emitted laser beam, D r is thereceiver aperture diameter, η sys is the system transmissionfactor, η atm is the atmospheric transmission factor, α is theincident angle between the object surface and laser beam, and ρ is the material reﬂectivity of the object. Speciﬁcally, thedistance measurement R and the incident angle α are extrinsicparameters, while η sys and η atm are constant parameters.Therefore, the intensity measurement I is determined by [21]: I = P r P e = D r ρ R η sys η atm cos α = η all ρ cos αR , (2)where η all is a constant. Therefore, the surface reﬂectivity ρ is only related to incident angle α and measured distance R by ρ ∝ IR cos α (3)For a LiDAR scan, the distance can be easily measured. Hencethe incident angle can be estimated by analyzing the localnormal. For each point p ∈ R we can search for the nearesttwo points p and p , so that the local surface normal n canbe expressed as: n = ( p − p ) × ( p − p ) | p − p | · | p − p | , (4) IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2021 where × is the cross product. The incident angle is cos α = p T · n | p | . (5)Therefore, we can derive the calibrated intensity scan ˜ I froma raw LiDAR scan with geometric reading P and intensityreading I . Speciﬁcally, the intensity is partially calibrated withdistance measurement by default in some sensors, hence weonly apply the calibration to the incident angle. Moreover,low intensity value often leads to a lower Signal-to-NoiseRatio (SNR), which also deteriorates the ranging accuracy forintensity measurement with low value [28]. Points with lowintensity are ignored in practice. B. Salient Point Selection and Feature Extraction

A LiDAR scan often consists of tens of thousands ofpoints. It is less computationally efﬁcient to use raw pointcloud matching methods such as ICP. Moreover, the raw datacontains noisy measurements that can reduce the matchingaccuracy, e.g. , measurements from trees alongside a road inautonomous driving. Hence it is more robust and computa-tionally efﬁcient to match the points clouds with features [7],[13]. Instead of using geometric-only features, in this paperwe utilize the feature based on both geometry and intensityinformation. The calibrated intensity information contains thereﬂectivity proﬁle of the environment that reveals the distribu-tion of different objects. Therefore, the intensity informationalso helps to recognize the identical features across multipleframes. For each point p i ∈ P and its intensity value ˜ η i ∈ ˜ I ,we search for the nearby points N i ∈ P and calculate the localdistance distribution σ G i and intensity distribution σ I i by: σ G i = 1 |N i | · (cid:88) p j ∈N i ( | p i − p j | ) σ I i = 1 |N i | · (cid:88) p j ∈N i ( | η i − η j | ) , (6)where |N | is the number of points. The features are selectedbased on the weighted summation: σ i = w G · σ G i + w I · σ I i , (7)where w G and w I are the weights of geometry and intensitydistribution, respectively. Note that for ﬂat surfaces such aswall, the smoothness values are small; while for corner oredge points, the smoothness values are large. Thus, for eachscan, edge features P E ∈ P are selected from the points withlarge σ , and planar features P S ∈ P are selected from thepoints with low σ . C. Intensity Map Building

The intensity map M contains the reﬂectivity distribution ofthe surrounding environment. For most geometric-only SLAM,the map is maintained and updated via occupancy grid [29]or Octomap [30]. The 3-D space is segmented into gridcells and each cell is represented by a probability function.Similar ideas can be used to construct and update an intensitymap. Instead of a probability function, we use the intensity observation I ( η i | z t ) to represent each grid cell m i [26].More speciﬁcally, for each observation of grid cell at time t , the surface reﬂectivity can be updated by: M ( m i | z t ) = M ( m i | z t − ) + η m i − M ( m i | z t − ) n m i , (8)where M ( m i | z t ) is the current intensity observation and n m i is the total number of observation times on cell m i . To notethat, if the grid contains no object, the intensity is marked as since there is no signal reﬂected. D. Scan to Map Matching

The laser odometry is the task to estimate the transformationmatrix T ∈ SE (3) between the current frame to the globalmap. The optimal pose estimation is calculated by minimizingthe geometry error and intensity error.

1) Geometry Residual:

Similar to LOAM [7], the geometricerror is calculated by matching the current edge and planarfeatures with the global map. It can be achieved by minimizingthe point-to-edge and point-to-plane residuals. Given an edgefeature p i ∈ P E and the transformed point ˆ p i = Tp i , we cansearch for two nearest points p E and p E from the global map.The point-to-edge residual is deﬁned by: f E (ˆ p i ) = (ˆ p i − p E ) × (ˆ p i − p E ) | p E − p E | . (9)Similarly, given a planar feature point p i ∈ P S and thetransformed point ˆ p i , we can search for three nearest points p S , p S , and p S from the global map. The point-to-planeresidual is deﬁned by: f S (ˆ p k ) = (ˆ p k − p S ) T · (cid:18) ( p S − p S ) × ( p S − p S ) | ( p S − p S ) × ( p S − p S ) | (cid:19) . (10)

2) Intensity Residual:

The intensity residual is calculatedby matching the features with the intensity map. It can beachieved by minimizing the intensity residual between thecurrent point p i (including both edge features and planarfeatures) and the transformed point ˆ p i in the intensity map: f I (ˆ p i ) = η i − M (ˆ p i ) , (11)where M ( ˆ p i ) is the point intensity value in the intensitymap M . To search for the intensity information in the in-tensity map, we introduce trilinear interpolation. AlthoughFig. 3: Trilinear interpolation. ANG et al. : INTENSITY-SLAM: INTENSITY ASSISTED LOCALIZATION AND MAPPING 5 it is more straightforward to represent the intensity by thenearest grid cell, the intensity information is less accurateespecially for large scale mapping where grid resolution islow. For each transformed point ˆ p i = [ x i , y i , z i ] T , we canﬁnd the surrounding eight grid cells. As shown in Fig. 3,the intensity measurements of these grid cells are noted as M (ˆ p i ) , M (ˆ p i ) , · · · , M (ˆ p i ) . Let the center position of cell1 (closest to the origin) be p = [ x , y , z ] T and the centerposition of cell 8 (farthest to the origin) be p = [ x , y , z ] T ,where x − x , y − y , and z − z are the width, height anddepth of each grid cell, respectively. The intensity estimationof the target point is calculated as: M (ˆ p i ) = x i − x x − x · M (ˆ p i ) + x i − x x − x · M (ˆ p i ) , (12)Similarly, we can derive M , M and M . The intensityvalue is estimated as M (ˆ p i ) = y i − y y − y · M (ˆ p i ) + y i − y y − y · M (ˆ p i ) M (ˆ p i ) = y i − y y − y · M (ˆ p i ) + y i − y y − y · M (ˆ p i ) M (ˆ p i ) = z i − z z − z · M (ˆ p i ) + z i − z z − z · M (ˆ p i ) .

3) Pose Estimation:

The ﬁnal pose can be estimated byminimizing both geometry residual and intensity residual: T ∗ = arg min T ∗ (cid:88) p i ∈P E | f E (ˆ p i ) | + (cid:88) p i ∈P S | f S (ˆ p i ) | + (cid:88) p i ∈P | f I (ˆ p i ) | . (13)It can be solved by the Levenberg–Marquardt algorithm [31].Note that the initial pose alignment is estimated by assumingconstant angular velocity and linear velocity which can in-crease convergence speed compared to assuming an identicaltransformation matrix. E. Loop Closure Detection & Global Optimization

The goal of loop closure detection is to identify a revisitedscene from historical data. For a front-end only SLAM system,it is inevitable to have drifts from the measurement noises.To reduce the localization drifts, the back-end SLAM detectsthe loops by recognizing identical places. We use key frameselection to reduce the retrieval time since the computationalcost of loop closure detection is often high. The key frames areselected based on the following criteria: (1) The displacementof the robot is signiﬁcant, i.e. , greater than a pre-deﬁnedthreshold; (2) The change of rotation angles is signiﬁcant; (3)The time elapsed is more than a certain period. For large scaleenvironments, the thresholds are set to be higher to reduce thecomputational cost. All key frames are stored into a pose graphmaintained in the back-end.For each keyframe, we use Intensity Scan Context (ISC)[9] to extract the current frame into a global signature.Compared to geometric-only descriptors such as GLAROT3D[32], NBLD [33] and Scan Context [34], the ISC is robust torotation change in identifying a loop closure. The ISC is a2D matrix calculated by equally dividing polar coordinates inazimuthal and radial directions into N s sectors and N r rings.Each subspace is represented by the max intensity of the points within the area. Given a key frame, we can extract the ISCdescriptor Ω from both intensity and geometry information.To compare a query ISC descriptor Ω q with a candidate ISCdescriptor Ω c , let v qi and v ci be the i th column of Ω q and Ω c .The similarity score ϕ (Ω q , Ω c ) is found by taking the meancosine distance of each sector: ϕ (Ω q , Ω c ) = 1 N s N s − (cid:88) i =0 ( v qi T · v ci (cid:107) v qi (cid:107) · (cid:107) v ci (cid:107) ) . (14)The loop closure can be determined by setting a threshold on ϕ (Ω q , Ω c ) .Loop closure detection is able to efﬁciently identify looppairs. However, false detection can lead to failure of posegraph optimization. In order to prevent false positives, geom-etry consistency veriﬁcation is used to check the similarity ofcandidate frames. For the candidate loop frame, we search forthe nearby line and plane information from the global map.And for current frame, edge and planar features are extractedand matched to the corresponding global lines and planesby minimizing the point-to-edge and point-to-plane distances.When two frames are unrelated, the sum of distances is oftenhigh. Hence the false positive can be ﬁltered out by setting athreshold on it. With the revisited place identiﬁed, we can addthe edge between two frames into the pose graph and globaloptimization can be applied to correct the drifts [35].IV. EXPERIMENT EVALUATION A. Experiment Setup

In this section we present comprehensive experiments of theproposed Intensity-SLAM, including the evaluation on publicdataset and practical experiments on a real robot platform.For public dataset test, the proposed method is evaluated inKITTI [15] that is commonly used for SLAM evaluation.The algorithm is implemented in C++ on Robotics OperatingSystem (ROS) Melodic and Linux Ubuntu 18.04. In order tomeasure the computational cost of different approaches, allalgorithms are tested on a desktop computer equipped withan Intel 6-core i7-8700 processor CPU. The instruction setof Eigen is SSE2. In the practical test, the proposed methodis integrated into an Automated Guided Vehicle (AGV) in awarehouse environment. In the AGV test, our algorithm isrunning in an Intel NUC mini PC with an Intel i7-10710Uprocessor which is often used on mobile robots.

B. Evaluation Metric

To evaluate the accuracy of localization, We use AverageTranslational Error (ATE) and Average Rotational Error (ARE)[15]: E rot ( F ) = 1 |F| (cid:88) i,j ∈F ∠ [ˆ T j ˆ T − i T i T − j ] E trans ( F ) = 1 |F| (cid:88) i,j ∈F || ˆ T j ˆ T − i T i T − j || , (15)where F is a set of frames ( i, j ) , T and ˆ T are estimated andtrue LiDAR poses respectively, and ∠ [ · ] is the rotation angle.The computing cost is measured by the average processingtime of each frame. IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2021 -200 0 200 4000100200300400500 0 200 400 6000200400600800 -200 0 2000100200300 -200 -150 -100 -50 0-50050100 -100 0 100 200 3000100200300400500

Fig. 4: Result of the proposed method on KITTI dataset. From left to right are Sequence 00, 02, 05, 07, and 09. The resultof intensity-SLAM is plotted in green and the ground truth is plotted in red.

C. Evaluation on Public Dataset

The KITTI dataset is collected from an autonomous drivingcar running in a large scale urban area. It is challenging due tothe complex environment with both static and dynamic objects.The proposed method is ﬁrstly evaluated on the KITTI datasetwith various scenarios such as urban, highway and country. Intotal we tested over more than 10,000 frames with more than10 km travelling distance. Some of the experimental resultsare shown in Fig. 4, in which Sequence 00, 02, 05, 07, and 09are selected for demonstration. The trajectory generated fromthe proposed method is in green and the ground truth is inred. It can be seen that our method is able to track the robot’spose accurately.

D. Ablation Study of Intensity Information

We also demonstrate the effect of intensity information inlocalization. To be consistent, the loop closure detection isremoved and only front-end laser odometry is compared. Inthe experiment, the “geometric-only laser odometry” methodonly includes point-to-edge and point-to-plane residual, while“odometry with intensity” method integrates intensity residualinto the geometric-only laser odometry. All other parametersare kept the same. Moreover, we also include the result ofthe Iterative Closest Point method for comparison. The resultsare shown in Fig. 5 and Table I. In particular, we use KITTISequence 05 for illustration. It can be seen that by adding theintensity residual, the overall ATE is reduced from . to . and the overall ARE is reduced from . × − deg /m to . × − deg /m . The intensity information improves thetranslational and rotational accuracy by around 10%, while itdoes not increase the computing time too much. E. Comparison with Existing Methods

We also compare our approach with state-of-the-art methodsincluding LeGO-LOAM [13], A-LOAM, and HDL-GRAPH-SLAM [8]. All approaches are tested using the same settingsand the results are shown in Table II. It can be seen from thetable that our method is competitive in terms of computationalcost. And it is worthy noting that our method also achievesthe best accuracy and outperforms existing geometric-onlymethods. -200 0 2000100200300400

ICP

Odom (Geometric-Only)

Odom(with intensity)

ProposedGround Truth * * Trajectory Start

Fig. 5: Performance comparison on KITTI Sequence 05.

Method Time (ms) ATE (%) ARE (deg/m)Iterative Closest Point 142 1.031 0.0092Odometry(geometric-only) Intensity-SLAM

TABLE I: Result analysis of intensity-aided method and non-intensity-aided method on KITTI the dataset.

F. Validation on Practical Robots

In this section, we test the performance of our method ona practical robot in a warehouse environment. To this end,we implement our algorithm on an industrial AGV that isused for grabbing and transporting materials. As shown inFig. 6 (a), the AGV is equipped with an Intel NUC minicomputer and a Velodyne VLP-16 LiDAR for perceptionand localization. In this experiment, the robot uses Intensity-SLAM for localization and is supposed to fetch items from thetarget storage shelf and return. The localization and mappingresult is shown in Fig. 6 (b), with the robot’s trajectory plottedin red. It can be seen that our method is able to provide reliablelocalization and generate a dense 3D map for a complex indoorenvironment. In particular, the robot is manually controlled

ANG et al. : INTENSITY-SLAM: INTENSITY ASSISTED LOCALIZATION AND MAPPING 7 (a) (b)

Fig. 6: Localization and mapping result in warehouse envi-ronment. (a) AGV platform for warehouse manipulation. (b)Localization and mapping result of the proposed method. Therobot’s trajectory is plotted in red color.

Methods Time (ms) ATE (%) ARE (deg/m)

Intensity-SLAM

LeGO-LOAM TABLE II: Performance comparison of different approacheson KITTI Sequence 05.by a joystick to return to the same location and we recordthe ﬁnal localization drift. The ﬁnal localization drift is about5 cm , which is accurate enough for general indoor roboticapplications. V. CONCLUSIONIn this paper, we present a novel intensity-assisted fullSLAM framework for LiDAR-based localization system. Ex-isting methods mainly leverage geometric-only features toestimate location and ignore intensity information. We be-gin with the physical model of intensity measurement andillustrate the use of intensity information. We introduce theintensity map and intensity residual for pose estimation toimprove the localization accuracy. Moreover, we propose afull SLAM structure that consists of both front-end and back-end systems, namely Intensity-SLAM. Thorough experimentsare performed in different scenarios, including indoor AGVmapping in warehouse and outdoor autonomous driving inurban areas. The experimental results show that by integratingintensity information, the localization accuracy is signiﬁcantlyimproved. The proposed method also achieves much lowertranslational error and rotational error compared to the state-of-the-art geometric-only methods. Our method is publiclyavailable. R EFERENCES[1] D. Sun, F. Geißer, and B. Nebel, “Towards effective localization indynamic environments,” in , 2016, pp. 4517–4523. [2] E. Stenborg, C. Toft, and L. Hammarstrand, “Long-term visual localiza-tion using semantically segmented images,” in , 2018, pp. 6484–6490.[3] R. Liu, C. Yuen, T.-N. Do, D. Jiao, X. Liu, and U.-X. Tan, “Cooperativerelative positioning of mobile users by fusing imu inertial and uwb rang-ing information,” in , 2017, pp. 5623–5629.[4] M. Sweatt, A. Ayoade, Q. Han, J. Steele, K. Al-Wahedi, and H. Karki,“WiFi based communication and localization of an autonomous mobilerobot for reﬁnery inspection,” in , 2015, pp. 4490–4495.[5] Q. Li, J. P. Queralta, T. N. Gia, Z. Zou, and T. Westerlund, “Multi-sensorfusion for navigation and mapping in autonomous vehicles: Accuratelocalization in urban environments,”

Unmanned Systems , vol. 8, no. 03,pp. 229–237, 2020.[6] C. Debeunne and D. Vivet, “A review of visual-lidar fusion basedsimultaneous localization and mapping,”

Sensors , vol. 20, no. 7, p. 2068,2020.[7] J. Zhang and S. Singh, “LOAM: Lidar Odometry and Mapping in real-time.” in

Robotics: Science and Systems , vol. 2, no. 9, 2014.[8] K. Koide, J. Miura, and E. Menegatti, “A portable 3D LIDAR-basedsystem for long-term and wide-area people behavior measurement,”

IEEE Trans. Hum. Mach. Syst , 2018.[9] H. Wang, C. Wang, and L. Xie, “Intensity scan context: Coding intensityand geometry relations for loop closure detection,” in , 2020,pp. 2095–2101.[10] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek, “The trimmediterative closest point algorithm,” in

Object recognition supported byuser interaction for service robots , vol. 3. IEEE, 2002, pp. 545–548.[11] K. Koide, J. Miura, and E. Menegatti, “A portable three-dimensionalLIDAR-based system for long-term and wide-area people behaviormeasurement,”

International Journal of Advanced Robotic Systems ,vol. 16, no. 2, p. 1729881419841532, 2019.[12] D. Rozenberszki and A. Majdik, “LOL: Lidar-Only Odometry andLocalization in 3D point cloud maps,” in , 2020, pp. 4379–4385.[13] T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimizedlidar odometry and mapping on variable terrain,” in ,2018, pp. 4758–4765.[14] W. Lu, G. Wan, Y. Zhou, X. Fu, P. Yuan, and S. Song, “DeepICP: Anend-to-end deep neural network for 3D point cloud registration,” arXivpreprint arXiv:1905.04153 , 2019.[15] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driv-ing? The KITTI vision benchmark suite,” in

Conference on ComputerVision and Pattern Recognition (CVPR) , 2012.[16] W. Lu, Y. Zhou, G. Wan, S. Hou, and S. Song, “L3-net: Towards learningbased LiDAR localization for autonomous driving,” in

Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition ,2019, pp. 6389–6398.[17] A. Segal, D. Haehnel, and S. Thrun, “Generalized-ICP.” in

Robotics:Science and Systems , vol. 2, no. 4. Seattle, WA, 2009, p. 435.[18] T. Stoyanov, M. Magnusson, H. Andreasson, and A. J. Lilienthal, “Fastand accurate scan registration through minimization of the distancebetween compact 3d ndt representations,”

The International Journal ofRobotics Research , vol. 31, no. 12, pp. 1377–1393, 2012.[19] X. Chen, T. L¨abe, A. Milioto, T. R¨ohling, O. Vysotska, A. Haag,J. Behley, and C. Stachniss, “OverlapNet: Loop closing for LiDAR-based SLAM,” in

Proceedings of Robotics: Science and Systems (RSS) ,2020.[20] Y. Tian, X. Liu, L. Li, and W. Wang, “Intensity-assisted ICP for fastregistration of 2D-LIDAR,”

Sensors , vol. 19, no. 9, p. 2124, 2019.[21] R. A. Hewitt and J. A. Marshall, “Towards intensity-augmented slamwith LiDAR and ToF sensors,” in , 2015, pp. 1956–1961.[22] D. P. Frost, O. K¨ahler, and D. W. Murray, “Object-aware bundle adjust-ment for correcting monocular scale drift,” in , 2016, pp. 4770–4776.[23] R. Hewitt, “Intense navigation: Using active sensor intensity observa-tions to improve localization and mapping,” Ph.D. dissertation, 2018.[24] D. G. Lowe, “Object recognition from local scale-invariant features,” in

Proceedings of the seventh IEEE International Conference on ComputerVision , vol. 2, 1999, pp. 1150–1157.[25] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary RobustInvariant Scalable Keypoints,” in , 2011, pp. 2548–2555.

IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2021 [26] S. Khan, D. Wollherr, and M. Buss, “Modeling laser intensities forsimultaneous localization and mapping,”

IEEE Robotics and AutomationLetters , vol. 1, no. 2, pp. 692–699, 2016.[27] S. Kohlbrecher, J. Meyer, O. von Stryk, and U. Klingauf, “A ﬂexibleand scalable SLAM system with full 3D motion estimation,” in

Proc.IEEE International Symposium on Safety, Security and Rescue Robotics(SSRR) , November 2011.[28] J. Lin and F. Zhang, “Loam livox: A fast, robust, high-precision LiDARodometry and mapping package for LiDARs of small FoV,” in ,2020, pp. 3126–3131.[29] S. Khan, D. Wollherr, and M. Buss, “Adaptive rectangular cuboids for3D mapping,” in , 2015, pp. 2132–2139.[30] K. M. Wurm, A. Hornung, M. Bennewitz, C. Stachniss, and W. Burgard,“Octomap: A probabilistic, ﬂexible, and compact 3D map representationfor robotic systems,” in

Proc. of the ICRA 2010 workshop on bestpractice in 3D perception and modeling for mobile manipulation , vol. 2, 2010.[31] J. J. Mor´e, “The levenberg-marquardt algorithm: implementation andtheory,” in

Numerical analysis . Springer, 1978, pp. 105–116.[32] D. L. Rizzini, “Place recognition of 3D landmarks based on geometricrelations,” in , 2017, pp. 648–654.[33] T. Cieslewski, E. Stumm, A. Gawel, M. Bosse, S. Lynen, and R. Sieg-wart, “Point cloud descriptors for place recognition using sparse visualinformation,” in , 2016, pp. 4830–4836.[34] G. Kim and A. Kim, “Scan context: Egocentric spatial descriptor forplace recognition within 3D point cloud map,” in ,2018, pp. 4802–4809.[35] G. Grisetti, R. K¨ummerle, C. Stachniss, and W. Burgard, “A tutorialon graph-based SLAM,”