[PDF] UPSLAM: Union of Panoramas SLAM

Abstract

We present an empirical investigation of a new mapping system based on a graph of panoramic depth images. Panoramic images efficiently capture range measurements taken by a spinning lidar sensor, recording fine detail on the order of a few centimeters within maps of expansive scope on the order of tens of millions of cubic meters. The flexibility of the system is demonstrated by running the same mapping software against data collected by hand-carrying a sensor around a laboratory space at walking pace, moving it outdoors through a campus environment at running pace, driving the sensor on a small wheeled vehicle on- and off-road, flying the sensor through a forest, carrying it on the back of a legged robot navigating an underground coal mine, and mounting it on the roof of a car driven on public roads. The full 3D maps are built online with a median update time of less than ten milliseconds on an embedded NVIDIA Jetson AGX Xavier system.

Full PDF

UUPSLAM: Union of Panoramas SLAM

Anthony Cowley, Ian D. Miller and Camillo Jose Taylor

Abstract — We present an empirical investigation ofa new mapping system based on a graph of panoramicdepth images. Panoramic images eﬃciently capturerange measurements taken by a spinning lidar sensor,recording ﬁne detail on the order of a few centimeterswithin maps of expansive scope on the order of tens ofmillions of cubic meters. The ﬂexibility of the system isdemonstrated by running the same mapping softwareagainst data collected by hand-carrying a sensor arounda laboratory space at walking pace, moving it outdoorsthrough a campus environment at running pace, driv-ing the sensor on a small wheeled vehicle on- and oﬀ-road, ﬂying the sensor through a forest, carrying it onthe back of a legged robot navigating an undergroundcoal mine, and mounting it on the roof of a car drivenon public roads. The full 3D maps are built online witha median update time of less than ten milliseconds onan embedded NVIDIA Jetson AGX Xavier system.

I. INTRODUCTION

Simultaneous localization and mapping (SLAM) is acore capability underpinning general ﬁeld robotics. It isa keystone for software needing to relate instantaneousreadings from a robot’s sensors to historical data – any-thing from what the robot perceived a fraction of a secondin the past, to a pre-loaded map of a location – whileallowing the robot to extend its corpus of knowledgeabout how its sensors perceive its surroundings. Relatingcurrent sensor data to a map permits localization, whilecomputing transformations that explain how sensor datachanges over time enables the extension of that map withnew data. The degree to which existing techniques meetthe challenge of SLAM depends upon factors such as [1]: • Motion

Will the sensor be moved quickly or slowly?Over smooth or bumpy ground, or through the air? • Geometry/Appearance

Does the target environ-ment oﬀer enough essential features (e.g. planes, cor-ners, textures)? • Performance

Accuracy, computational eﬃciency,and how the approach scales with environment size.Existing mapping systems tend to work well whenoperating under conditions similar to those in which theiroriginal authors demonstrated their performance. But inexploration tasks, our robots reliably discover vantagepoints with wildly varying feature availability while ex-periencing rapid, unexpected motions as they traverseunfamiliar terrain. Bumping and bouncing, slipping andsliding, or executing rapid pivots all have the potential tomomentarily interfere with motion estimation by reducing

The authors are with the GRASP Laboratory, School of Engi-neering and Applied Sciences, University of Pennsylvania, Philadel-phia PA 19104.Corresponding author: [email protected] overlap between consecutive sensor readings, or by break-ing assumed motion priors used to improve optimizationconvergence (e.g. planar motion). The environment itselfplays a role as transitions between enclosed and openspaces, or rounding a corner with poor visibility can leadto rapid signiﬁcant changes in the types of visual andgeometric features captured by a sensor. The challengesof these transient, yet frequently encountered, conditionsmay be diminished by ensuring that every scrap of in-formation a sensor produces is leveraged by the SLAMsystem.Consequently, we investigate a new SLAM method builtto make use of as much range data as possible. The robust-ness of the approach is demonstrated along all three axesidentiﬁed above by applying it to data collected by a sensorcarried by a walking human, a running human, a smallwheeled robot platform, a legged robot platform, an aerialrobot platform, and an automobile, in environments fromindoor clutter to underground tunnels to wild forests topublic roads. Its performance is quantiﬁed, demonstratingtrajectory error of 0.05% the trajectory length, its abilityto run within a

15 W power budget, and its scaling fromrepresenting the geometry of window mullions to copingwith 21.5 million cubic meters of space attached to a . trajectory.In this work, the environment is represented as a seriesof panoramic depth maps. This choice allows us to rep-resent our local maps using a 2D array based representa-tion. Our implementation is able to eﬀectively exploit theregular memory access patterns and parallelism that thisrepresentation aﬀords. As a result it runs very quickly evenon embedded GPU platforms. An important consequenceis that we are able to make use of all available rangemeasurements and avoid the need for early decisions aboutfeatures. We argue that using all of the data providesadditional robustness which allows the method to handlea wide variety of environments. II. Related Work

The SLAM problem has among the deepest wealthof research of any area of robotics. We draw particularinspiration from the idea of building an atlas [2] of lo-cal maps linked by rigid body transformations. As lidarsensors increase in resolution, keyframe techniques [3], [4]more commonly applied to camera-like sensors gain ap-peal. With such techniques, an instantaneous data captureserves as the initial value of a local map. In particular, weadvocate the use panoramic image keyframes [5] for theirsuitability to capture data gathered while rotating. a r X i v : . [ c s . R O ] J a n ig. 1: Example keyframe panorama from a farm lanebordered by a fence. The upper image shows depth in-formation, with lighter shades corresponding to greaterdepth, while the lower ﬁgure maps surface normal vectorcomponents to color.Modern geometric SLAM systems, such as LOAM [6],have found success from winnowing the input data streamsof high-resolution lidar through feature selection. In sub-sequent works, the authors improve robustness by addingcomplementary sensing modalities as fallbacks for when aprimary sensor, typically a lidar, is unable to maintaincontinuous tracking [7]. LeGO-LOAM [8] is a notablespecialization of LOAM to ground vehicle scenarios. Itincorporates changes to point cloud ﬁltering and featureselection that enable it to run at the sensor’s updaterate by tracking fewer features than the original LOAMformulation.A parallel trend in mapping is driven by the greatsuccess found in the use of features detected by visiblelight and infrared sensitive cameras, with ORB-SLAM [9]a notable contributor to the popularization of monocularSLAM. It is able to extract and track features quicklyenough for motion estimation, while the features it ﬁndsare also suitable for loop closing. The advantages of cam-eras are that they are inexpensive, mechanically robust,and historically endowed with more generous, or at leastsymmetric, ﬁelds of view than laser range ﬁnders. How-ever, a limitation of many texture-based feature mappingsystems is that they produce a sparse 3D reconstructionof a workspace as deﬁned by the tracked features. Thislimitation is somewhat alleviated by direct methods suchas Direct Sparse Odometry [10] that optimize diﬀerencesin image intensity values rather than extracted features.Such techniques can produce denser models than feature-based approaches since direct methods leverage more ofthe available image data. Work combining the dense 3Dscanning of RGB-D sensors with more discriminative ORBfeatures has been shown to work for moderately sizedscenes[11]. III. UPSLAM

Union of Panoramas SLAM (UPSLAM) represents amap as a graph whose nodes are panoramic depth im-ages augmented with surface normal estimates, Figure 1,

Sweep Number P o i n t C o un t Fig. 2: Count of valid depth measurements over time forthe Newer College Dataset.and whose edges are 3D transformations between thosedepth image keyframes. Keyframes may have diﬀerentresolutions and extents than raw sensor data; we typicallyoperate with a wider vertical ﬁeld of view for keyframesthan the sensor captures. The system keeps track of acurrent keyframe which is updated with each new lidarsweep. The experiments described here were conductedwith an Ouster OS1-64 lidar, in which an array of 64lasers covering a 33° vertical ﬁeld of view are swept aboutan axis at

10 Hz , and an IMU provides accelerometer andgyroscope measurements at

100 Hz . A. Data Intake

During rapid motions, the

100 ms period of a lidar sweepmay include signiﬁcant sensor motions. Subsequent dataprocessing will search for a rigid body transformationbetween 3D points derived from a single sweep and amodel point set. The tension between applying a singletransformation to points collected at diﬀerent sensor posesis mitigated by using high-frequency inertial measure-ments to rotate all range measurements to the rotationalframe of a single point in time. The rotations based onIMU and lidar timing data are computed on the CPU,while virtually everything that follows is computed on theGPU.The points resulting from this IMU motion correctionare projected into a panoramic image. The image structureis helpful to compute a rough surface normal estimate asthe cross product between the vectors deﬁned by a centralpoint and two of its image coordinate neighbors. Thesesurface normal estimates, which are again represented asan image, are very noisy, but may be eﬃciently smoothedusing an edge sensitive smoothing algorithm such as theà trous wavelet ﬁlter [12].

B. Motion Estimation

Each incoming panorama of depth and surface normalvectors is registered to the current keyframe using ICPwith a point-to-plane error metric [13], [14], Equation 1. E ( T ) = (cid:88) ( s ,s ) ∈ C ∧ Ω d,θ ( s ,s ) | ( T s − s ) · N ( s ) | (1)Here, we ﬁnd a 3D transformation, T , that minimizes thedistance between each pair of points drawn from the set ofrojective correspondences, C , when projecting the vectorbetween corresponding points onto the normal vector ofthe ﬁrst, N ( s ) . A similarity ﬁlter, Ω , is applied to the setof correspondences to reject those for which the distance, d , between the two points is greater than some threshold,or when the angle, θ , between their associated surfacenormal vector estimates is too great. This method maybe eﬃciently performed with a GPU [15].The surface normal estimate for the sweep may beslightly over smoothed by the à trous ﬁlter, but thissmoothness helps with optimization convergence. Thekeyframe surface normal estimates are updated by eachsweep, which restores some of the detail lost to smoothing.A critical factor in the robustness of the mapping system ishow many points are incorporated into this optimization.Figure 2 shows the number of valid points over time for atypical experiment, where our notion of validity is deﬁnedas a point deriving from a valid range estimate (the sensorwill not return a measurement for every pixel in everysweep), and an intensity above a small threshold (thisrejects points whose range estimate is based on very littlereceived light). In the example shown in the ﬁgure, thesensor starts out near a wall in a tunnel, but once itemerges into the open, 40 to 50 thousand points are usedat the 10Hz sweep rate. C. Keyframe Update

The optimized transformation provides a basis for aﬁnal set of projective correspondences, in which eachkeyframe pixel is projected onto the sweep depth image.Just as with the ICP optimization, a similarity ﬁlteris used to determine if the new sweep data should beaveraged in to the keyframe pixel. A weighted average isused when updating the keyframe: each keyframe pixeltracks how many samples have been incorporated intoit, and that number – saturated at 10 in our system– is used to determine the weightings of the averagingprocedure. The simplicity of this averaging scheme is oﬀsetby the use of multiple keyframes in the map representationwhen considered holistically. Moving objects can strain thebalance of smoothing and responsiveness within a singlekeyframe, but considering multiple keyframes reconstructsa latent multi-hypothesis model of 3D occupancy whichis an eﬀective tool for ﬁltering moving objects from themap. This mutual consistency check between keyframesis performed by UPSLAM at runtime whenever switchingkeyframes in order to reduce the impact of moving objects.Once updated, a decision is made as to whether weshould continue using the current keyframe. If the reg-istration quality – deﬁned as the ratio of ICP registrationinliers to valid depth measurements in the sweep – hasdropped below a conﬁgurable threshold , a new keyframeis sought. The pose graph is ﬁrst consulted to identify thenearest keyframes to the current pose estimate to detectpotential loop closures. This conﬁguration parameter was allowed to vary between ex-periments.

Fig. 3: A

145 m trajectory through a cluttered lab space.The ceiling was removed for visualization clarity.Since these loop closure registrations are likely to occurunder the inﬂuence of signiﬁcant drift, a grid search isundertaken in which several hundred transformations areevaluated with low-resolution projective alignment checks.The best candidate found by this search is promotedto a full-resolution ICP registration. If the registrationquality is higher than the threshold for creating a newkeyframe, the pose graph is updated with an edge betweenthe current keyframe and the neighbor, and then thatneighbor becomes the new current keyframe. Otherwise,if that ratio is higher than 75% of the new keyframe inlierratio, the pose graph is updated with the transformationcomputed from the registration, but the current keyframeis not updated. We refer to these as strong and weak loopclosures: both varieties improve the global accuracy ofthe map, but strong loop closures additionally reduce thenumber of keyframes created when a location is re-visited.The pose graph structure is reﬂected in GTSAM [16],which is used to globally optimize inter-keyframe trans-formations on the CPU. Finally, a complementary ﬁlter[17] on the IMU data is used to maintain a smoothedestimate of Earth’s gravity vector which is attached toeach keyframe as it is added to GTSAM. The observationsof this shared phenomenon help minimize the eﬀects oferrors in pitch, which are relatively common given thelimited vertical ﬁeld of view of the sensor.

IV. Qualitative Breadth

A. University - Indoor

A ﬁrst test of the mapping software is a

145 m walkaround a lab space, moving at speed of – / s . Themap shown in Figure 3 was captured with 50 keyframeimages occupying

80 MiB on disk. While the motion here isbenign, the abundance of glass around the angular sectionat the bottom of the ﬁgure, along with a few narrowpassages, provides some challenge due to the sensor’sinability to reliably detect glass or walls closer than around

50 cm . a)(b) Fig. 4: A running loop around the Pennovation campuswith the sensor carried overhead.

B. University - Outdoor

The sensor was then taken outside, carried overhead,and run along a

375 m loop around a cluster of buildings ona University campus at – / s . The resulting point cloud,shown in Figure 4, beneﬁts from the single loop closureformed when the sensor came within approximately

10 m of its starting location near the trees in the lower partof Figure 4a. The running pace was chosen to stressnon-planar motion with high amplitude rotations due toshaking and bouncing.

C. Farm

A Clearpath Robotics Jackal, Figure 5a, was drivenalong a

731 m route along a driveway, dirt trail, andgrass on a farm, Figure 5b. This excursion presentedseveral challenges due to the rough terrain and variedsurroundings. Tall vegetation waved in the breeze, whilethe pond itself oﬀered minimal range measurements. Asteep drop to the side of the raised trail around the pondprevented the sensor from seeing any nearby ground (theleft of Figure 5c). The path taken involved mud, logs, andvegetation that the vehicle could not always pass directlythrough, so the data collection included intermittent oc-clusion of the sensor, along with sharp accelerations to getthe vehicle un-stuck.

D. Underground

A Ghost Robotics Vision 60 quadruped robot waswalked along a 220m trajectory in an old coal mine, cap-turing the map shown in Figure 6 with just 44 panoramicimages. This experiment provided a substantially diﬀerentmotion model than those previously described as the wetgravel ﬂoor of the mine meant that the robot had tocontinually work to maintain its balance. Dripping water,mist, and dust contribute a substantial amount of noise toindividual lidar sweeps. Additionally, the tunnels visited (a) (b)(c)(d)

Fig. 5: The Miller family farm in central Pennsylvania(b). The 731m on- and oﬀ-road trajectory around pond iscaptured with geometric detail including individual railson gates and fences. (d).ranged from wide access to the main elevator, bottom-center of Figure 6c, to a narrow, angular tunnel approxi-mately across, Figure 6b, visible on the right side ofthe point cloud. The map is constructed despite the pres-ence of short bouts of uncertain footing for the robot dueto changes in wet, sloppy ﬂoor material, and a trajectorythrough the narrowest of walkways that involved the robotbumping from wall to wall.

E. Aerial

The Wharton State Forest in New Jersey is an expansivewilderness with dense tree cover. Flying a quadrotor ,Figure 7a, in this environment is challenging for manyreasons, among which is the challenge of building a mapin a region with no large planar surfaces. Occlusion edges,too, can be diﬃcult to work with as they are virtually all Data courtesy of Treeswift. a) (b)(c)

Fig. 6: A quadrupedal robot (a) was walked around a coalmine (b) along a 220m trajectory. The resulting map (c)was captured with 44 images. (a)(b) (c)

Fig. 7: A quadrotor was ﬂown along a

117 m trajectoryin a dense forest. The entire environment (b) is devoid ofmanmade structure, but the trunks and branches of themany trees are cleanly captured in the point cloud (c).tree trunks without a geometrically simple background.One moment a sight line passing the edge of a tree trunkis too long for the lidar to reliably measure, but the nextmoment, from a slightly diﬀerent angle, the same treetrunk partially occludes another nearby tree. UPSLAM’suse of dense surface measurements extracts as much aspossible from the sensor, and it is able to track its motionalong a

117 m trajectory while producing a map populatedwith crisply deﬁned trees, Figure 7. This map is capturedin 30 keyframe images occupying

40 MiB on disk, suﬃcientto produce the point cloud shown in Figure 7c. The unionof panoramic images employed by UPSLAM is able to (a)(b)(c)

Fig. 8: A 4.4km route driven through Morgantown, Penn-sylvania, highlighted in Google Earth (a), seen from streetlevel (b), and viewed as a point cloud produced by UP-SLAM (c).eﬃciently capture this unstructured 3D geometry.

F. Automotive

While each of the previous experiments subjected thesystem to diﬀerent motion models, the fastest the sensorwas moved was about / s when carried by a runninghuman. The Ouster OS1-64 was scanning at

10 Hz for all ofthese data collections, meaning that a single sweep wouldsee the sensor moved by up to

30 cm . This section presentsa dramatically diﬀerent scenario: the sensor was mountedatop an automobile and driven . through a town atup to

50 km / h (31 miles per hour), Figure 8.This fast-moving data set prompted 70 loop closures,and produced 1045 keyframe images occupying 1220MiBon disk. Map construction involved 20GiB of raw, uncom-pressed keyframe data at runtime, however UPSLAM isable to stream keyframes between disk and GPU memoryas needed, allowing us to cap the resident set at 3GiB ofGPU memory. Because the sensor was moving so quickly– up to . per sweep – there is far less redundancyin the collected lidar point cloud, limiting opportunitiesfor compression, moving object ﬁltering, or even surfacesmoothing. However, considerable structural detail is stillapparent despite the presence of other vehicles on the road,Figure 9. Shown are renderings of the road at the topright of the trajectory shown in Figure 8 viewed from theright edge looking left. An opaque point cloud renderinghighlights depth, Figure 9a, while the same point cloud A second conﬁguration parameter was used for this experimentto enable a zero acceleration motion prior, as opposed to the defaultzero velocity prior used in all other experiments which involve moresudden reversals of velocity. a) (b)(c)

Fig. 9: A lower view of the Morgantown dataset showssome of the structural detail picked up by the single sensormoving at road speeds.with lowered opacity demonstrates ﬁne geometric detail,Figure 9b. Rendering the scene with a real time raymarched lighting model, Figure 9c, provides evidence thatthe map truly captures the three dimensional scene withsurface shading, shadows, and occlusion.

V. Quantitative Depth

The Newer College Dataset [18] provides an opportunityto quantify trajectory accuracy. Point clouds capturedby an Ouster OS1-64 were registered against a pointcloud collected by a tripod-mounted, survey grade LeicaBLK360 scanner. A sense of the space may be found inthe point cloud rendering shown in Figure 10a, showingthe quad on the lower right, the midsection, and then thepark on the left. The sensor was carried on a circuitousroute making two full laps around the college grounds . A. Trajectory Accuracy

Using time stamps to align the UPSLAM and groundtruth trajectories shows broad agreement, Figure 11, withmost of the discrepancy introduced around a particularcorner of the park area visible at the bottom-right of theﬁgure. The temporally aligned trajectories provide a set ofpoint correspondences. We use this correspondence set tocompute a 3D transformation that optimally aligns thetrajectories with respect to squared Euclidean distancebetween each member of each correspondence. The cor-respondences aligned in this way have a median distanceof .

62 m along the entire . trajectory. This metric issimilar to the Absolute Trajectory Error (ATE) [19], and isdesigned to show global consistency between trajectories.The ATE uses the root mean square error rather than themedian, which results in an ATE of .

77 m for this tra-jectory alignment, or 0.05% of the total trajectory length. This is the short_experiment of the dataset. (a)(b)

Fig. 10: An overhead view of a point cloud representingNew College, Oxford University (a) provides a sense ofscale, while dropping to ground level between trees and awall in the park section, (b), shows geometry density. -100-50050 0 50 100 150 200

X (m) Y ( m ) Trajectory

Ground TruthUPSLAM

Fig. 11: Comparison of ground truth and UPSLAM tra-jectories for the Newer College Dataset.Even with the optimal alignment of the two trajectories,the two visits to the lower right of Figure 11 are visible aspeaks in trajectory error, Figure 12.

B. Computational Performance

The NVIDIA Jetson AGX Xavier embedded platformmay be conﬁgured to run with various limits on its powerconsumption. These limits are observed by lowering theclock frequencies of the CPU and GPU, and disablingsome of the eight CPU cores. The highest power mode runsat the maximal clock frequencies, and is limited only bythe 65W power adapter. In this conﬁguration, we considerhigh resolution, 2048x256, and low resolution, 1024x128,keyframes, Figure 13. The next power limits we consider .51.01.5 0 500 1000 1500 time (s) e rr o r ( m ) Fig. 12: Error over time between UPSLAM and groundtruth for the Newer College Dataset.

Mode CPU CPU GPU MemoryCores MHz MHz MHzMax 8 2265.6 1377 213330W 6 1450 900 160015W 4 1200 670 1333

TABLE I: Jetson Xavier Power Modesare 30W and 15W, detailed in Table I .The time taken to incorporate a new lidar sweep intothe map is shown for each conﬁguration in Figure 13.The median time taken with the Xavier’s maximal powerbudget and high resolution keyframes is . The ﬁgureshows a kernel density estimate of update times takenwhile processing the Newer College Dataset to reﬂect thefact that not every update takes the same amount of time:some updates trigger new keyframe creation, and somemay result in relatively slow graph updates due to loopclosures. This long tail of update times is more apparentwhen considering low resolution keyframes with maximal,30W, and 15W power budgets, which resulted in medianupdate times of ,

13 ms , and

20 ms . Inputs are buﬀeredwhen graph optimization takes longer than the intervalbetween sweeps. The Xavier running at full power withthis workload exhibits similar performance to NVIDIA’sQuadro P1000 or GTX 1050 GPUs, hardware often foundin laptop computers.

VI. CONCLUSIONS

The UPSLAM mapping system has been qualitativelyevaluated across diverse environments spanning indoor,outdoor, underground, aerial, and public road settings.These challenging mapping scenarios involving mud, robotcollisions, dense foliage, bodies of water, and road traﬃcwere all handled by the same mapping software adjustedwith only two conﬁguration parameters. Beyond quali-tative checks for functionality, accuracy was quantiﬁedwith a benchmark dataset demonstrating an absolutetrajectory error of 0.05% of the total trajectory length.Performance was measured at a median update rate of lessthan

10 ms on an embedded compute platform. Workspacescaling has been demonstrated by navigating corridors andtunnels across, as well as multi-kilometer road ways.Reconstructed geometry quality has been shown withreal time visualizations supporting rich lighting eﬀects, The 15 and 30W power levels tested are models 2 and 4 ofNVIDIA’s nvpmodel utility for the Xavier AGX.

012 5 10 20 100 1000 time (ms) d e n s i t y Conﬁguration

High ResLow ResLow Res, 15WLow Res, 30W

Fig. 13: Kernel density estimate of time to compute SLAMupdates on an NVIDIA Jetson AGX Xavier at diﬀerentkeyframe resolutions and computer power limits.such as accurate shadows relating the mutual visibilityof centimeter-scale geometry at ranges of hundreds ofmeters. We believe that this eﬀort pushes the boundariesof robustness and performance for SLAM software, whiledemonstrating the eﬀectiveness of the graph of panora-mas map representation and the design choice to use allavailable range data during registration.

References [1] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza,J. Neira, I. Reid, and J. J. Leonard, “Past, present, and futureof simultaneous localization and mapping: Toward the robust-perception age,”

IEEE Transactions on Robotics , vol. 32, no. 6,pp. 1309–1332, 2016.[2] M. Bosse, P. Newman, J. Leonard, and S. Teller, “An atlasframework for scalable mapping,” in in IEEE InternationalConference on Robotics and Automation , 2003, pp. 1899–1906.[3] M. Meilland and A. Comport, “On unifying key-frame and voxel-based dense visual SLAM at largescales,” in

International Conference on Intelligent Robotsand Systems

Journal of Field Robotics , vol. 32, no. 4,pp. 474–503, June 2015. [Online]. Available: https://hal.inria.fr/hal-01010429[5] C. J. Taylor, A. Cowley, R. Kettler, K. Ninomiya, M. Gupta, andB. Niu, “Mapping with depth panoramas,” in , Sept 2015, pp. 6265–6272.[6] J. Zhang and S. Singh, “LOAM: Lidar odometry and mappingin real-time,” in

Robotics: Science and Systems Conference(RSS) , 2014. [Online]. Available: https://doi.org/10.15607/rss.2014.x.007[7] ——, “Laser-visual-inertial odometry and mapping with highrobustness and low drift,”

Journal of Field Robotics , vol. 35,no. 8, pp. 1242–1264, 2018. [Online]. Available: https://doi.org/10.1002/rob.21809[8] T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain,” in

IEEE/RSJ International Conference on Intelligent Robots andSystems (IROS) . IEEE, 2018, pp. 4758–4765.9] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, “Orb-slam: A versatile and accurate monocular slam system,”

IEEETransactions on Robotics , vol. 31, no. 5, pp. 1147–1163, Oct2015.[10] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,”

IEEE Transactions on Pattern Analysis and Machine Intelli-gence , mar 2018.[11] H. Liu, C. Li, G. Chen, G. Zhang, M. Kaess, and H. Bao,“Robust keyframe-based dense SLAM with an RGB-D camera,”

CoRR , vol. abs/1711.05166, 2017. [Online]. Available: http://arxiv.org/abs/1711.05166[12] H. Dammertz, D. Sewtz, J. Hanika, and H. Lensch, “Edge-avoiding À-trous wavelet transform for fast global illuminationﬁltering,” vol. 2010, 01 2010, pp. 67–75.[13] P. Besl and N. D. McKay, “A method for registration of 3-d shapes,”

Pattern Analysis and Machine Intelligence, IEEETransactions on , vol. 14, no. 2, pp. 239–256, 1992.[14] Y. Chen and G. Medioni, “Object modelling by registrationof multiple range images,”

Image and Vision Computing ,vol. 10, no. 3, pp. 145–155, Apr. 1992. [Online]. Available:http://dx.doi.org/10.1016/0262-8856(92)90066-C[15] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux,D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges,and A. Fitzgibbon, “Kinectfusion: Real-time dense surfacemapping and tracking,” in

Proceedings of the 2011 10thIEEE International Symposium on Mixed and AugmentedReality , ser. ISMAR ’11. Washington, DC, USA: IEEEComputer Society, 2011, pp. 127–136. [Online]. Available:http://dx.doi.org/10.1109/ISMAR.2011.6092378[16] F. Dellaert, “Factor graphs and GTSAM: A hands-on introduc-tion,” 2012.[17] R. G. Valenti, I. Dryanovski, and J. Xiao, “Keeping a goodattitude: A quaternion-based orientation ﬁlter for imus andmargs,”

Sensors2012 IEEE/RSJ International Conference onIntelligent Robots and Systems