[PDF] SRM: An Efficient Framework for Autonomous Robotic Exploration in Indoor Environments

Abstract

In this paper, we propose an integrated framework for the autonomous robotic exploration in indoor environments. Specially, we present a hybrid map, named Semantic Road Map (SRM), to represent the topological structure of the explored environment and facilitate decision-making in the exploration. The SRM is built incrementally along with the exploration process. It is a graph structure with collision-free nodes and edges that are generated within the sensor coverage. Moreover, each node has a semantic label and the expected information gain at that location. Based on the concise SRM, we present a novel and effective decision-making model to determine the next-best-target (NBT) during the exploration. The model concerns the semantic information, the information gain, and the path cost to the target location. We use the nodes of SRM to represent the candidate targets, which enables the target evaluation to be performed directly on the SRM. With the SRM, both the information gain of a node and the path cost to the node can be obtained efficiently. Besides, we adopt the cross-entropy method to optimize the path to make it more informative. We conduct experimental studies in both simulated and real-world environments, which demonstrate the effectiveness of the proposed method.

Full PDF

SSRM: An Efﬁcient Framework for Autonomous Robotic Exploration inIndoor Environments

Chaoqun Wang , Delong Zhu , Teng Li , Max Q.-H. Meng ∗ , and Clarence De. Silva Abstract — In this paper, we propose an integrated frameworkfor the autonomous robotic exploration in indoor environments.Specially, we present a hybrid map, named Semantic RoadMap (SRM), to represent the topological structure of theexplored environment and facilitate decision-making in theexploration. The SRM is built incrementally along with theexploration process. It is a graph structure with collision-freenodes and edges that are generated within the sensor coverage.Moreover, each node has a semantic label and the expectedinformation gain at that location. Based on the concise SRM,we present a novel and effective decision-making model todetermine the next-best-target (NBT) during the exploration.The model concerns the semantic information, the informationgain, and the path cost to the target location. We use thenodes of SRM to represent the candidate targets, which enablesthe target evaluation to be performed directly on the SRM.With the SRM, both the information gain of a node and thepath cost to the node can be obtained efﬁciently. Besides, weadopt the cross-entropy method to optimize the path to makeit more informative. We conduct experimental studies in bothsimulated and real-world environments, which demonstrate theeffectiveness of the proposed method.

I. INTRODUCTIONAutonomous exploration has an increasing number ofapplications such as disaster relief, search and rescue, andenvironment monitoring. It is a fundamental capability forthe mobile robot to gather information of an unknownenvironment. The primary goal is to autonomously gatherrequired information with limited time or resource budget,which poses great challenges for exploration strategies.Various past methodologies have been proposed to tacklethe autonomous exploration problem, among which the mostinvestigated approach is the nearest frontier method [1].The frontier in this method is deﬁned as the boundarybetween the known and unknown regions on the incompletemap. In this scheme, the robot always pursuits the nearestfrontier to explore. Besides the frontier-based method, theinformation gain is also frequently adopted in the context ofautonomous exploration [2], where the NBT is determined byevaluating the amount of information gain at the target areas.Typically, these approaches use the metric map to representthe environment and extract the required information, whichcan be time-consuming in large-scale environments. ∗ Corresponding Author Chaoqun Wang, Delong Zhu and Max. Q. H. Meng are withThe Chinese University of Hong Kong, China. { cqwang, dlzhu,qhmeng } @ee.cuhk.edu.hk Teng Li and Clarence W. de Silva are with TheUniversity of British Columbia, Vancouver, BC, Canada. { tengli,desilva } @mech.ubc.ca OfficeInfo: LaboratoryInfo:CorridorInfo:

Fig. 1. SRM during the exploration. The blue lines and the colourfulpoints represent the edges and the nodes of the graph, respectively. Thereis a semantic label and the information gain for each node.

In contrast with the majority of frontier or informationbased methods, little effort has been devoted to developingspecial-designed path planning methods for the exploration.However, an efﬁcient path planner is a prerequisite for thetarget evaluation involving path cost. Sampling-based pathplanning method [3] is a broadly adopted approach in theexploration. It is more efﬁcient with respect to the large-scaleor high-dimensional environments. However, these planningmethods are not historical-aware. A path will be replannedusing these methods on the whole map when the robot needsto backtrack to some places that is previously visited withoutfurther exploitation. Besides, the previous planning methodspay more attention to the path cost without considering theinformation collected along the path.In this paper, we propose to use a novel hybrid map,the SRM, to guide the exploration. This topological mapis a graph structure that is built incrementally over thecourse of exploration. As shown in Fig. 1, the node onthe graph structure has two attributes: the semantic labeland the information gain at the corresponding location. TheNBT selection is performed directly on this topological map.Based on the SRM, we propose a hierarchy decision-makingstrategy, where the high-level decision is made according tothe semantic information, and the low-level decision makingis concerned with the path cost and the information gain.Taking advantage of such a process, not only can we getthe information gain efﬁciently comparing with computingon the whole metric map, but we can also query a path costdirectly on the graph structure. Furthermore, the queried pathis optimized using the cross-entropy method. Hence the robotcan get the most information along the generated trajectory. a r X i v : . [ c s . R O ] D ec he experimental results demonstrate that our method canexplore the unknown environment more efﬁciently than theconventional methods.The remaining of this paper is organized as follows.In Section. II, we introduce some related work about au-tonomous exploration. Then in Section. III, we give somepreliminaries about our problem. After that, we introduceour methodology in Section. IV. Our experiments results arepresented in Section. V. Finally, we give the conclusion andfuture work in Section. VI.II. R ELATED WORK

A large number of related approaches are developed tocope with different aspects of the autonomous explorationproblem [4]. In this section, we mainly focus on threetightly coupled aspects that our framework is built uponin the autonomous exploration. To start with, we give anoverview of the map representation in robotic navigation.Then we describe different decision-making strategies inthe exploration process. Finally, we review the literature oninformative path planning during the exploration.

A. Map representation

The study of map representation in robot tasks is oneof the most active areas in robotics research. Generallyspeaking, the map can be divide into four groups: the metricmap, the topological map, the semantic map, and the hybridmap [5]. Various approaches have been proposed to enablethe robot navigation with different kinds of maps [6], butfew studies have been focused on the map representationduring the exploration. Choset et al. [7] proposed a sensorbased hierarchically generated Voronoi graph to represent theenvironment. One can query a path efﬁciently and completelyon the graph during the exploration. Rezanejad et al. [8]suggest to use the topological map generated by the ﬂuxskeleton based method on the occupancy grid map, where theskeleton is generated online by utilizing image-based methodinstead of constructing incrementally with the exploration.

B. Guidance of exploration

NBT evaluation is another critical issue in autonomousexploration. Two frequently used metrics for determiningthe NBT are the nearest frontier and the information gainof the candidate targets [9][10]. Shen et al. [11] improve theexploration efﬁciency through evaluating the NBT accordingto a stochastic differential equation-based method. Charrow et al. proposed to use the information-theoretic objectivefunction to guide the exploration. More recently, Umari etal. [12] present an exploration method based on rapidlyexploring random tree (RRT). The frontier regions can bedetected efﬁciently through sampling based method. Besides,some novel metrics are also proposed for the evaluationof NBT. Dornhege et al. [13] present a frontier-void-basedmethod that uses both the frontier and the unexplored 3Dvolumes to guide the exploration. Visser et al. [14] proposeto balance the path cost and the information gain during theexploration. In recent years, there has been growing interests in mak-ing the autonomous exploration more intelligent. Owald etal. [15] present an exploration strategy by exploiting thestructure of the environment. Semantic information has beenwidely used for the robot navigation as well [16]. Withsemantic reasoning, a robot can achieve human-like per-formance. Moreover, some learning-based methods towardautonomous exploration have been proposed along with thedevelopments in deep learning approaches [17].

C. Informative path planning

Path planning is of great importance during the explo-ration but most work mainly pays attention to the motionconstraints without considering the information gatheredalong the path. Heng et al. [18] propose an algorithm forvisual exploration considering both the exploration and thecoverage. The robot is pruning to select a path with moreinformation. Davis et al. [19] present a coverage aware pathplanning method. This approach plans a path from a startpoint to the goal while maximizing the user deﬁned regions.Tan et al. [20] propose an information gathering methodbased on cross-entropy. The trajectory is optimized using thecross-entropy method to maximize the objective function ininformation gathering.III. PRELIMINARIESOur main goal is to build a precise metric map of anunknown environment with as less path cost and explorationtime as possible. We use the occupancy grid map to modelthe environment. The map, denoted as M o ⊂ R , encodesthe probability of being occupied for every grid. Supposethere are m candidate regions { x , x , ..., x m } to be ex-plored, the information gain I i of the candidate region x i is obtained using M o . The entropy over the occupancy gridmap M o is denoted as: H ( M o ) = − (cid:88) i (cid:88) j p ( c i,j ) log p ( c i,j ) , (1)where c i,j ∈ M o is the grid cell of the map at location { i, j } .We use mutual information to represent the information gainfor a sensing location x i , which is deﬁned as: I ( M o ; x i ) = H ( M o ) − H ( M o | x i ) (2)We propose to use the SRM for the decision-makingprocess over the course of exploration. The SRM, denotedas G = { V, E } , is the topological map of the exploredenvironment. It is a graph structure with V representingthe vertices and with E represents the edges that connectthe vertices. For every v i ∈ V on the graph G , it has twoattributes: the semantic label S and the information gain I i during the exploration. A semantic mapping process ismaintained and the semantic label is continually attached tothe nodes of the graph. The nodes on the graph are used toindicate the candidate regions to be explored, i.e. v i ≡ x i .The path ˆ P from the robot current location x o to the target emantic mapping ( 𝑀 𝑠 ) Metricmapping ( 𝑀 𝑜 ) Topological mapping ( 𝐺 )LaserRGBD TrajectoryoptimizationPathfinding Decision making Fig. 2. Overall framework of our proposed SRM method. Details can befound in the text. area x i can be queried efﬁciently on the graph G . Then thepath cost (cid:96) ( ˆ P ) is calculated by: (cid:96) ( ˆ P ) = L ( ˆ P , M o ) . (3)The path ˆ P will be optimized with respect to M o utiliz-ing the function L ( · ) , which is the trajectory optimizationmethod based on cross-entropy.IV. M ETHODOLOGY

A. Method Overview

To build a precise metric map of the environment, webuild a topological map incrementally during the course ofexploration. As Fig. 2 shows, the mapping process maintainsthree maps: the occupancy grid map M o , the semantic map M s , and the graph structure G . The node on the graph G contains the semantic label provided by M s and theinformation gain queried on M o . Robot can ﬁnd a pathefﬁciently on the graph G , which is then optimized bythe cross-entropy based method. The determination of NBTinvolves not only the path cost and the information gain, butalso the high-level semantic information. All these evaluationmetrics are combined in the decision-making process toguide the robotic exploration. B. Semantic Road Map1) Tree construction:

The graph structure G = { V, E } is built incrementally along with the exploration process.Initially, there is only one original node root at the startingpoint where the robot starts to explore the environment, i.e. V = { v root } , E = ∅ . The pseudo code of building the graphstructure is shown in Alg. 1. We propose to use the samplingpoints under the sensor scope as the candidate vertices forbuilding the graph. As shown in Fig. 3, the laser beamsindicate the current sensor scope. Function Random ( · ) takesas input the sensor scope and generate sampling points V rand , as indicated by the yellow points in Fig. 3. For arandom point v r ∈ V rand , it will ﬁnd the nearest vertex ˆ v ongraph G according to: ˆ v = argmin v i ∈ ( V \ V (cid:48) ) || v i − v r || , V ∈ G , (4) Algorithm 1:

Semantic Topological Map Generation

Input: laserMsg, rgbdMsg

Output: semantic road map G while existing unexplored area do M ← metricM apping ( laserM sg ) ; V.pushback ( start node ) ; //generate candidate vertices ; for n th ray of LaserMsg do v rand = Random [0 , laserM sg.dist ( n th )] ; V rand .pushback ( v rand ) ; end //check validity of edges ; for i = 1 to Size ( V rand ) do rank ( V rand [ i ] , V, ascend ) ; for j = 1 to Size ( V ) do edge = Line ( V rand [ i ] , V [ j ]) ; if CheckValidity(edge, M ) then E.pushback ( edge ) ; V rand [ i ] .label = Label ( rgbdM sg, V rand [ i ]) ; V.pushback ( V rand [ i ]) ; break; // to simplify the graph end end end end Laser beamRobotSampling Points Obstacle

Fig. 3. Sampling under a sensor coverage. The yellow points represent thesampled points under the current sensor scope. where V (cid:48) is the set of the vertex on the graph and theconnection from v i ∈ V (cid:48) to v r is not collision free. If V \ V (cid:48) = ∅ , then v r is not a feasible random point since v i does not exist. Function Random ( · ) will be called severaltimes before a feasible sample point is obtained. Then, thefeasible point v r connect the nearest vertex in the graph G .The random point v r and the edge e that connects ˆ v and v r are added to the existing graph. Furthermore, each effectivevertex v r is attached with a semantic label which correspondsto the room type detected by the semantic mapping pipeline.However, if l = || ˆ v − v r || is less than a threshold, thenthe random point will be dropped and a new iteration willstart. Line 10-22 in Alg. 1 describe the details of adding avertex to the graph. The algorithm runs online along withthe exploration process. BC Unknown OccupiedFree Detected FrontierRay-casting Scope FrontiersRobot

Nodes

Edges

Fig. 4. Efﬁcient frontier detection. The red points indicate the nodes andthe blue lines represent the edges on the graph. The dotted circle representsthe scope of the ray-casting method.

We use A* to search a global path ˆ P on the existinggraph G . Since we use the node on the graph to represent thetarget, it is simple and straightforward to get a sparse path onthe graph. Some edges on the graph may become infeasiblealong with the exploration process. The reason for this isthat the collision checking is performed on the incompleteoccupancy grid map. The edges that lie in the unknown areain the map is assumed to be collision-free, but sometimesthere could be collision in the unknown area. In our proposedmethod, we check the feasibility of the path only if a pathis generated. If the generated path is not feasible, i.e. one ormore edges for constructing the graph are not collision free,we delete those edges on the graph and restart the path-ﬁnding process.

2) Efﬁcient frontier detection:

We propose an efﬁcientfrontier detection method based on the observation thatthe frontier cells are always appears grouply, as Fig. 4shows. When one frontier cell f is found, then the adjacentcell may also the frontier cell. A frontier clusters can beobtained efﬁciently through exploring the surrounding areaof a frontier f . To get the the cell f , the nodes on G are usedfor the evaluation. The graph G exists and extends towardsthe collision free area, while the frontier cells lie betweenthe collision free area and the unknown regions. Hence thefrontier point can be found near the nodes on the graph G inherently. We exploit the vertices on the graph G to ﬁnd thefrontiers over the course of exploration.As shown in Fig. 4, The shaded square indicates thefrontier cells around the nodes detected by the ray-castingmethod. Two frontier cells are detected around node B on the G . Note that there is another frontier cell that isnot detected by the raycasting method. This cell can befound efﬁciently through checking the neighbors of detectedfrontier cells. Hence all the frontier cells can be obtainedthrough this method. Unlike the conventional method thatchecks all the cells to get the frontiers. our approach is moreefﬁcient because we only need to check the cells aroundthe nodes on the graph. To further improve the efﬁciency ofour method, we divide the nodes on the graph G into twogroups: the closed nodes V closed and working nodes V open .The closed nodes are those no frontiers can be detected with.If there is no frontier detected around a nodes using the ray- Algorithm 2:

Fast Frontier Detection.

Input: M , v ∈ { V \ V closed } Output: frontier set F P r = rayCasting ( v ) ; for m = 1 to Size ( P r ) do P cand = P r ( m ) ; while notEmpty( P cand ) do for n = 1 to Size ( P cand ) do if isF rontier ( P cand ( n )) then [ P n , F i ] = f indF rontierN eighbor ( P cand ( n ) , M ) ; P total .pushback ( P n ) ; F .pushback ( F i ) ; end end P cand = P total ; end end if isEmpty ( F ) then V closed .pushback ( v ) ; end casting method, then this nodes is closed. We will not checkthese nodes when performing frontier detection. This prunemethod is a simplify that can further improve the efﬁciencysince the ray-casting method can also be time consuming.The proposed frontier detection method is shown in Alg.2. A cluster of points P r can be found through function rayCasting around node v . For every point in the set P r , if it is a frontier cell, then we mark this point onthe map and check its neighbor. If the neighbors are alsofrontiers, this process repeats until all the frontiers in theneighborhood are found. Line 2-13 in Alg. 2 describes thefrontier detection process. This iteration ends when there isno frontier detected. This means that there is no frontierregions around the vertex v , we will put this vertex intoclosed set and will not check it in the following iterations.We use the amount of frontiers to quantify the expectedinformation gain I around a node. With the proposed frontierdetection method, we can obtain the information efﬁciently. C. Informative path planning

The path ˆ P connecting the start and the goal re-gion contains the vertice { v , v , ..., v n } and the edges { e , e , ..., e n − } . This coherent path is not optimal sincethe vertice are generated randomly without considering anyconstraints. In the proposed framework, the robot is requiredto gather more information when it moves from one point toanother. The path should not only be with less path cost butit should also be more informative. The goal is to computea trajectory that satisﬁes: P = arg min P C ( P ) s.t. F ( P , (cid:96) ( ˆ P )) ≥ , (5)here C ( P ) is the reward function over the path. Weconsider both the path cost and the information gain I along the path. Thus we deﬁne the reward function by: C = − I ( P , t ) e − λ(cid:96) ( P ) , as modeled in [21], where λ is apositive constant which determines trade-off between theexploration and exploitation, (cid:96) ( P ) is the trajectory length ofthe path, and I ( P , t ) is the information gain along the pathdeﬁned by Equation. 2. F ( P , (cid:96) ( ˆ P )) ≥ guarantees the path P is collision-free and satisﬁes the motion constrains of therobot. Besides that, this function gives the upper bound ofthe path cost of the ﬁnal path. (cid:96) ( ˆ P ) is the path length of thesparse path ˆ P . The optimized path length, P opt should satisfythe constraint that: (cid:96) ( P opt ) ≤ (cid:96) ( ˆ P ) . This constraint canhelps when the optimization of the path does no converge.Moreover, the information gain along a path is a functionabout time t , which means that the sensor scan at time t S t should consider the effect of its previous scan S t − . Thiscan help the robot avoid the local minimum in informationrich area using our proposed objective function.The objective function is optimized using the cross-entropy method. In what follows we will mainly focus onhow to optimize our problem using the cross-entropy method.We refer the interested reader to [20][22] for more details.Our development follows closely [20] using notation adaptedto our setting. The path P is parameterized by the function P = φ ( z ) , where z ∈ Z is the parameter space. The costfunction can then be deﬁned by: z = arg min z C ( φ ( z )) s.t. F ( φ ( z ) , (cid:96) ( ˆ P ))) ≥ , (6)This optimization in Equation. 6 can be solved by thecross-entropy method. Denote r ∗ as the optimal reward, i.e. r ∗ = min J ( z ) , (7)where J ( z ) = (cid:82) T C ( φ ( z )) is the accumulated reward. Theparameter space Z has a pdf p ( θ ) . Here in our case the pdfis a mixture of Gaussian, where the number of Gaussiancomponents, n, is determined by the vertices on the originalpath ˆ P . Equation. 7 can be formulated as: ζ = P θ ( J ( Z ) ≤ r ) = E θ [ I { J ( Z ) ≤ r } ] (8)The optimal pairs ( r, θ ) can be updated by:1) Sample with current pdf p ( θ ) to get { z , z , ..., z m } ,determine R = { ˆ r , ˆ r , ..., ˆ r m } by Equation. 62) Select the elite set R ∗ , compute the ρ -th quantile r j

3) Update θ by: θ = arg min θ | ε | (cid:88) Z k ∈ ε lnp ( Z k ; θ ) (9)This iteration will end until a certain termination conditionis fulﬁlled. Then we get an optimized trajectory P opt , whichis more informative with less path cost as well. Room1 Room2Room3Corridor1 Corridor2

Fig. 5. Decision-making with SRM. The semantic information of the nodeis used for high-level decision-making.

D. Decision making

The target locations during the exploration are alwaysencoded by the nodes on the graph G , hence we can easilyquery a path to the target on the graph G . The NBT isdetermined by the high level semantic information, the pathcost and the information gain.The semantic information can give the robot high-levelinformation for guiding the exploration. We divide the sceneinto two categories, the room R and the connection C . C is the place that connects multiple rooms, for example,the corridor. During the exploration, robot always tends toexplore the rooms instead of the corridors. As Fig. 5 shows,the robot ﬁrstly explores the Room 1, Room 2, and Room3 ﬁrst according to the distance, then the Corridor 1 andCorridor 2 according to the distance and the informationgain. Noticed that when there is no corridor, the explorationwill become a Travel Salesman Problem (TSP). Robot willtraverse all the rooms according to the effective methods ofTSP. Overall the decision is made differently in two cases: • Case 1:

If there are rooms detected in the environment,the robot will traverse these rooms according to thedistance from its current location. When a new roomcomes up during this process, the robot will replan thistraverse path every time it ﬁnishes exploring one room. • Case 2:

If there are only corridors in the environment,the robot will select one target according to the rewardfunction C (cid:48) = − I ( P opt ) e − λ(cid:96) ( P opt ) . Robot considersboth the path cost (cid:96) ( P opt ) and the information gain I ( P opt ) when selecting the target regions in this case.Robot will make decisions according to the two casesabove. In this simple but effective decision model, robot candistinguish the rooms from the connection places. Generallyspeaking, the robot is able to explore the room one by onewithout wasting much time in the corridor. This effectivelyhelp reduce the backtracking problem, in which the robot willcome back and forth to explore unﬁnished previous tasks.V. E XPERIMENTS AND R ESULTS

To demonstrate the beneﬁts of our proposed explorationframework, we conduct a number of experiments in both sim-ulated and real-world environments. The experimental stud-ies demonstrate the effectiveness of our proposed pipelinea) T=1s (b) T=32s (c) T=180s (d) T=230s

Fig. 6. Autonomous exploration with SRM in simulated environment at different points of time. The SRM is represented by the colourful nodes and theblue edges. The red stripe indicates the frontiers.

Exploration Time /s F r on t i e r D e t e c t i on T i m e / s nopruneprune Fig. 7. Frontier detection time using our proposed method with and withoutpruning. as well as the efﬁciency of each individual module of theproposed framework.We test our proposed framework in a typical simulatedindoor environment. As Fig. 6 shows, the blue line indicatesthe edges and the colorful points indicate the nodes ofthe graph. Different colors of the node represent differentsemantic labels. The frontier region, as marked by the redstripe in the Fig. 6(b)-Fig. 6(c), can be completely andefﬁciently detected using the generated SRM. The reasonis that a lot of nodes that are near the frontiers are generatedduring SRM buliding process. This means the frontier’s lo-cation information is already encoded by these nodes, whichmakes the frontier detection task easier when performing theray-casting method. This experiment demonstrates that ourproposed topological map, which is built incrementally alongwith the exploration process, can be used for fast frontierdetection and guiding the exploration.To further explore our frontier detection method, wecarried out several experiments. Firstly, we conduct ray-casting on all the nodes of the graph to get the frontierregions. This method is more accurate but it is more time-consuming. Note that there are no frontier regions aroundsome nodes on the graph along with the exploration process,so we propose to prune those nodes. The experiment isperformed in the simulated environment, as shown in Fig.7(a). Overall, our proposed method can always detect thefrontier region within 0.4s, as shown in Fig. 7(b). For themethod without pruning, the frontier detection time increasessigniﬁcantly along with the exploration, corresponding to the

TABLE IS

TATISTICS OF THE COMPARATIVE STUDY BETWEEN OUR PROPOSEDPLANNER AND THE

RRT*

PLANNER . Env Size ( m ) Our planner RRT*Nodes Time(ms) Time(ms)Map1 225 276 0.53ms 7.93msMap2 300 328 0.62ms 8.72msMap3 320 359 0.72ms 9.37msMap4 260 290 0.57ms 8.31ms increase of the number of nodes on the graph. The methodwith pruning is even more efﬁcient after pruning extra nodes.As a quantitative measurement, the red line in Fig. 7 indicatesthe frontier detection time after pruning extra nodes. As wecan see, the frontier detection time is increasing during theﬁrst 50 seconds. This is because there exists a lot of frontierregions and only few extra nodes can be pruned. However,the frontier detector can become increasingly efﬁcient withmore and more extra nodes being pruned.Robot can query a path efﬁciently on the topological map G . To demonstrate the efﬁciency of our proposed planner, weconduct a comparative study in four different environments.The size of each environment is reported in TABLE. I. Thecontrol variable is the planner used during the explorationand the nearest frontier based method is adopted for thedecision-making module in all the experiments. We chooseRRT* as the comparable planner. As reported in Table. I,our planner is more efﬁcient than the RRT* planner in all thefour experimental scenarios. The time spent on the test groupand the control group are both positively related to the mapsize. Moreover, our method can even better in large-scale orhigh-dimensional environments.The performance of our trajectory optimization method isdemonstrated in Fig. 9. The proposed objective function isoptimized using the cross-entropy method. As shown in Fig.9, the sparse trajectory is queried directly on the generatedgraph structure. The black line indicates the shortest pathsatisfying the motion constraints. The useful informationis evaluated by the unknown area covered by the robotsensor scope. The black path is short without consideringthe collected information along the path. Using our proposedcross-entropy based method, we can ﬁnd a path, as the redline shows in Fig. 9(a), along which the robot can collectmore information. With relatively longer path, the robot can

50 100 150 200

Time/s -14-12-10-8-6-4-20 E n t r op y / na t Nearest FrontierMax EntropyOurs

Map 1

Time/s -14-12-10-8-6-4-20 E n t r op y / na t Nearest FrontierMax EntropyOurs

Map 2

Time/s -14-12-10-8-6-4-20 E n t r op y / na t Nearest FrontierMax EntropyOurs

Map 3

Time/s -14-12-10-8-6-4-20 E n t r op y / na t Nearest FrontierMax EntropyOurs

Map 4

Fig. 8. Variations of map entropy along with the exploration in four different environment scenarios using our proposed method and two comparativemethods. (a)

Iterations C o s t (b) Fig. 9. (a) Trajectory optimization using our proposed method. The bluelines are the sparse path queried from the SRM. The black line is the shortestpath satisfying the motion constrains. The red line is the informative pathgenerated by our optimization method. (b) The variation of the objectivefunction along with the iterations during the optimization. explore more unknown areas when it executes the path fromthe start to the goal. Fig. 9(b) shows the variations of theobjective function at different steps of the iterations usingour cross-entropy based method. Obviously, the value of theobjective function is reduced and we can get an optimal pathafter a few iterations.We proposed to guide the exploration using the semanticinformation, the information gain and the path cost to thetarget. The experiments are performed in four environmentsto evaluate the efﬁciency of our exploration strategy. Wecompare our method with the Nearest frontier based methodand the Max Entropy method. The nearest frontier methodis a greedy method which always seeks the nearest frontierto explore during the exploration, while the Max Entropyselects the most informative area as the target to explore.Compared with these two methods, our method is alwaysmore efﬁcient in all the four environments, as shown in Fig.8. The entropy can be reduced to a smaller value with lesstime using our method. However, it is difﬁcult to select abetter one from the other two methods. The performances ofthese two methods are highly dependent on the environment.The statistics of the path length and time spending arereported in Fig. 10. Our proposed guidance strategy can ex-plore the environment with less time and path cost. Besides,the standard deviation of our method is also less than theother ones, which means our method works more steadily.There is no much difference between the performance of theNearest Frontier based method and the Max Entropy method.Furthermore, we can see from Fig. 10 that longer path meanslonger exploration time. This is because of the fact that mostexploration time is spent on the execution of the path. The

Map 1 Map 2 Map 3 Map 40100200300400500 T i m e / s Nearest FrontierMax EntropyOurs (a)

Map 1 Map 2 Map 3 Map 4050100150200 P a t hLeng t h / m Nearest FrontierMax EntropyOurs (b)

Fig. 10. Exploration time and path cost in four different environments withthree methods. less the path is, the less exploration time is needed. Ourproposed guidance strategy is useful in reducing the pathcost, as shown in Fig. 10. Besides, the proposed trajectoryoptimization method can also provide more information withless path cost, which in turn reduce the path length.To evaluate the efﬁciency of our proposed framework, wecompare our method with the method that considers boththe path cost and the information gain, in which the twocriteria are linearly combined weighted by two coefﬁcients γ and γ . The combined method uses the RRT* for the pathplanning during the exploration. Fig. 11 shows the experi-ment environment, where one typical exploration trajectoriesof our method and the combined method are illustrated.We compare our method with the combined method atthree different starting locations and the data is recorded in B C

Fig. 11. Real-world experimental study. The blue line and red line representthe exploration trajectory of the combined method (96.5m) and our method(72.2m), respectively. TABLE IIC

OMPARISONS WITH THE COMBINED METHOD AT DIFFERENT STARTINGLOCATIONS . Ours CombinedPath(m) Speed(m/s) Path(m) Speed(m/s)A 72.1 ± ± ± ± ± ± Table. II. As we can see, the performance of our proposedmethod is better than the comparison group in all thesethree scenarios.The combined method may achieve betterperformance with proper γ and γ . For example, there isno much difference of the mean path length when startingat location A. However, the combined method is not stable,which can be seen from the larger standard deviation of thepath length.VI. C ONCLUSION AND FUTURE WORK

In this paper, we propose a novel hybrid map, SRM,for autonomous exploration in indoor environments. Thishybrid map features the combination of the semantic mapand the topological map. It is a graph structure where anode contains the information gain and the semantic labelcorresponding to the node location. Both the informationgain and the path cost can be obtained efﬁciently with thisgraph structure. Moreover, we use the cross-entropy basedoptimization method to optimize the queried sparse path.This optimization provides a more informative trajectory.The semantic information, combined with the informationgain and the path cost, is used to guide the exploration. Ourframework is not a simple integration of several modulesbut a tightly coupled system. The modules in this systemcontribute to each other to make the exploration more efﬁ-cient. The experimental studies demonstrate that our methodis effective and more efﬁcient than the others.Our method currently focuses on the 2D environment.In the future, we will extend our method into 3D explo-ration area. We plan to use the octomap to represent theenvironment. Our proposed method can efﬁciently detect thefrontiers and plan a path in 3D environments. R

EFERENCES[1] Brian Yamauchi. A frontier-based approach for autonomous explo-ration. In

Computational Intelligence in Robotics and Automation,1997. CIRA’97., Proceedings., 1997 IEEE International Symposiumon , pages 146–151. IEEE, 1997.[2] Cyrill Stachniss, Giorgio Grisetti, and Wolfram Burgard. Informationgain-based exploration using rao-blackwellized particle ﬁlters. In

Robotics: Science and Systems , volume 2, pages 65–72, 2005.[3] Steven M LaValle. Rapidly-exploring random trees: A new tool forpath planning. 1998.[4] Angelos Mallios, Pere Ridao, David Ribas, Marc Carreras, andRichard Camilli. Toward autonomous exploration in conﬁned under-water environments.

Journal of Field Robotics , 33(7):994–1012, 2016.[5] Sebastian Thrun et al. Robotic mapping: A survey.

Exploring artiﬁcialintelligence in the new millennium , 1(1-35):1, 2002.[6] Kurt Konolige, Eitan Marder-Eppstein, and Bhaskara Marthi. Naviga-tion in hybrid metric-topological maps. In

Robotics and Automation(ICRA), 2011 IEEE International Conference on , pages 3041–3047.IEEE, 2011.[7] Howie Choset and Joel Burdick. Sensor-based exploration: Thehierarchical generalized voronoi graph.

The International Journal ofRobotics Research , 19(2):96–125, 2000.[8] Morteza Rezanejad, Babak Samari, I Rekleitis, Kaleem Siddiqi, andGregory Dudek. Robust environment mapping using ﬂux skeletons. In

Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ InternationalConference on , pages 5700–5705. IEEE, 2015.[9] Matan Keidar and Gal A Kaminka. Efﬁcient frontier detection forrobot exploration.

The International Journal of Robotics Research ,33(2):215–236, 2014.[10] Maani Ghaffari Jadidi, Jaime Valls Miro, and Gamini Dissanayake.Gaussian processes autonomous mapping and exploration for range-sensing mobile robots.

Autonomous Robots , 42(2):273–290, 2018.[11] Shaojie Shen, Nathan Michael, and Vijay Kumar. Autonomous indoor3d exploration with a micro-aerial vehicle. In

Robotics and Automation(ICRA), 2012 IEEE International Conference on , pages 9–15. IEEE,2012.[12] Hassan Umari and Shayok Mukhopadhyay. Autonomous roboticexploration based on multiple rapidly-exploring randomized trees. In

Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ InternationalConference on , pages 1396–1402. IEEE, 2017.[13] Christian Dornhege and Alexander Kleiner. A frontier-void-basedapproach for autonomous exploration in 3d.

Advanced Robotics ,27(6):459–468, 2013.[14] Arnoud Visser and Bayu A Slamet. Balancing the information gainagainst the movement cost for multi-robot frontier exploration. In

European Robotics Symposium 2008 , pages 43–52. Springer, 2008.[15] Stefan Oßwald, Maren Bennewitz, Wolfram Burgard, and CyrillStachniss. Speeding-up robot exploration by exploiting backgroundinformation.

IEEE Robotics and Automation Letters , 1(2):716–723,2016.[16] Cyrill Stachniss. Multi-robot exploration using semantic place labels.In

Robotic Mapping and Exploration , pages 73–90. Springer, 2009.[17] Delong Zhu, Tingguang Li, Danny Ho, and Max Q-H Meng. Deepreinforcement learning supervised autonomous exploration in ofﬁceenvironments. In

Robotics and Automation (ICRA), 2018 IEEEInternational Conference on . IEEE, 2018.[18] Lionel Heng, Alkis Gotovos, Andreas Krause, and Marc Pollefeys.Efﬁcient visual exploration and coverage with a micro aerial vehiclein unknown environments. In

ICRA , volume 3, pages 3–5, 2015.[19] Bobby Davis, Ioannis Karamouzas, and Stephen J Guy. C-opt:Coverage-aware trajectory optimization under uncertainty.

IEEERobotics and Automation Letters , 1(2):1020–1027, 2016.[20] Y. T. Tan, Abhinav Kunapareddy, and Marin Kobilarov. Gaussianprocess adaptive sampling using the cross-entropy method for envi-ronmental sensing and monitoring.

IEEE International Conference onRobotics and Automation (ICRA), Brisbane, Australia , 2018.[21] H´ector H Gonz´alez-Banos and Jean-Claude Latombe. Navigationstrategies for exploring indoor environments.

The International Jour-nal of Robotics Research , 21(10-11):829–848, 2002.[22] Marin Kobilarov. Cross-entropy motion planning.