Learning to Navigate: Exploiting Deep Networks to Inform Sample-Based Planning During Vision-Based Navigation
LLearning to Navigate: Exploiting Deep Networks to InformSample-Based Planning During Vision-Based Navigation
Justin S. Smith , Jin-Ha Hwang , Fu-Jen Chu , and Patricio A. Vela Abstract — Recent applications of deep learning to naviga-tion have generated end-to-end navigation solutions wherebyvisual sensor input is mapped to control signals or to motionprimitives. The resulting visual navigation strategies work verywell at collision avoidance and have performance that matchestraditional reactive navigation algorithms while operating inreal-time. It is accepted that these solutions cannot provide thesame level of performance as a global planner. However, it isless clear how such end-to-end systems should be integrated intoa full navigation pipeline. We evaluate the typical end-to-endsolution within a full navigation pipeline in order to expose itsweaknesses. Doing so illuminates how to better integrate deeplearning methods into the navigation pipeline. In particular,we show that they are an efficient means to provide informedsamples for sample-based planners. Controlled simulationswith comparison against traditional planners show that thenumber of samples can be reduced by an order of magnitudewhile preserving navigation performance. Implementation on amobile robot matches the simulated performance outcomes.
I. I
NTRODUCTION
Contemporary machine learning (ML) algorithms demon-strate the potential to provide end-to-end learning such thatnavigation tasks can be replaced with black box image-to-control or image-to-action mappings [1], [2]. This data-driven approach is inline with– but on a path to supercede–traditional (manually engineered) reactive navigation strate-gies. The idea behind ML methods is that the underlyinglearner, today usually a deep network, implicitly learnsstructural models of the world within its internal represen-tation. These implicit structural models are then encodedin the feature vector output of the network and supporteffective decision making at the output end, following theincorporation of a shallow learner.The traditional deliberative, or global, navigation pipelineis heavily model based (in the sense that it requires modelsof the robot and the world) and is at odds with end-to-end machine learning approaches. Several ML-navigationbased papers further conclude that it is best to fuse theiralgorithms with higher level planners to overcome the short-term, greedy nature of ML-navigation [2], [3]. One structuralissue with doing so is that typical end-to-end systems areoften not meant to consider higher level planning actions,thus there is no obvious procedural strategy for achieving thedesired recommendation. An exception being ML-strategies *This work supported in part by NSF Awards J.S. Smith, J. Hwang, F.J. Chu, and P.A. Vela are with theSchool of Electrical and Computer Engineering and the Institute forRobotics and Intelligent Machines, Georgia Institute of Technology, At-lanta, GA 30308, USA. { jssmith, jhwang44, fujenchu,pvela } @gatech.edu that explicitly incorporate the task [3], or that learn motionprimitives [4]. The latter can be concatenated with a higherlevel planner. Additional work is needed though to align themotion primitive control decisions with the global goal.In this paper, we explore the standard ML pipeline usedfor end-to-end navigation and demonstrate that it is notwell suited to incorporation into the traditional, closed-loop, goal-directed navigation pipeline. Furthermore, sincetraditional global navigation approaches involve exhaustiveor highly sampled search, their performance relative to ML-methods should be considered as upper bounds. Taking asystems view of vision-based navigation, we propose sensi-ble modifications to ML-based navigation methods so thatthey are more compatible with the traditional goal-directednavigation stack. Importantly, the input/output function learntmust be altered to consider down-stream use of the MLoutput. Implementing the modified ML-driven approaches incontrolled simulations we show that a planning-aware ML-based approach to navigation can indeed provide comparableperformance to more global or exhaustive approaches whilereducing the trajectory search space. The ML-based moduleeffectively acts as a data-driven sampling heuristic for mini-mizing the trajectory search and providing presumably highquality navigation directions. The resulting planner leverageslearning where learning excels, and relies on model-basedstrategies where they excel. A. Related Work
Due to the extensive literature and history of researchefforts on planning, this review of related work focusesprimarily on contemporary learning-based approaches tonavigation. It is presumed that the reader has some familiaritywith planning in general [5].Collision-free navigation requires processing sensor mea-surements to understand or infer the local scene geometry,then making a navigation decision based on both the scenegeometry and the global goal points. For mobile robots withmonocular cameras, depth information must be inferred fromthe sensing time signal. As an input/ouput mapping, MLhas shown success in recovering good estimates of scenestructure for navigation purposes [6], [7], [8]. The ML blockgenerates missing information for decision making.Others propose an end-to-end process, whereby image tocontrol decisions are learnt instead [1], [2]. In the caseswhere expert demonstration is employed, it is not clear thatthe expert’s motives for the control decisions necessarilytranslate to desirable navigation choices given the mobilerobot’s goals. When the control objective is clear, as in a r X i v : . [ c s . R O ] J a n volve Sense ReactControl(a) Evolve Sense Fuse PlanControl(b) Evolve Sense Fuse PlanControl React(c) Fig. 1: Block diagram of (a) local/reactive; (b) global/deliberative; and (c) mixed navigation pipelines.stabilization, then the image to control mapping is effectivelyinjective, thereby reducing mismatch between implicit infor-mation contained in the training set and the explicit goal [9].Another issue that arises with many end-to-end methods isthat the underlying strategy is functionally a regressor [10].For navigation, which is non-convex due to the topologicalsplits that occur in trajectory space, regression may not bethe best choice. Similar problems may arise for methodsthat learn motion primitives for specific scenarios [4], asa particular motion primitive may be biased towards onetopological instance of a reaction from a set of potentialreactions. The bias may undermine the global objective.The work in [3] is one of the the first to consider thegoal state within end-to-end training, which involves laserscans and relative target waypoints as input and steeringcommands as output. However, their system still requireshuman intervention to correct for local minima since it isnot linked to higher level planning. It is not clear howsuccessful the system is at autonomous operation relativeto the global objective since baseline comparison againststandard methods is not performed.An intermediate family of ML-navigation methods convertthe image into actionable information (or affordance states),such as local structural information useful for driving alongroadways [11] (e.g., distance to vehicles, lane locations,lane marker relative offsets, etc.). Another example learns toidentify forest trails for navigation guidance [12]. The outputstates are used to then generate a navigation decision. Thetarget application is usually a restricted form of navigation.When input imagery is too novel or reflects extremeimaging conditions, then ML-models may fail at the trainedtask. Detecting failure, or the potential of failure, can itself becast into a learning framework to thereby supervise the onlineend-to-end ML controller [13], [14]. In [15], the supervisoris an autoencoder whose reconstruction should match thescene. Failure to reconstruct the input image indicates anout-of-sample situation requiring a fall-back plan.Within the area of Reinforcement Learning (RL), it iscommon to learn image-to-action mappings, usually in thecontext of video games [16], [17], [18], [19], [20]. Byvirtue of an existing reward function, the methods learn task-oriented behavior. Though the imagery may be quite diverse,the underlying task domain is usually limited in scope rela-tive to the general task of navigation, and the approaches seeka slightly different purpose from the one stated here. Though[21], [22] concern navigation and even consider specific task-based objectives, the aim of the research is to demonstrateimplicit learning of the environment from imagery basedon repeated trials within the environment. Changing theenvironment requires further rounds of training.
B. Robotic Navigation Systems
Figure 1 depict three varieties of navigation pipelines.The majority of ML-based navigation methods operate at thereactive level, Figure 1a, whereas some of the task-based andgame-based methods operate as per Figure 1b for the caseof a given, fixed environment. In these instances, novel oraltered environments require additional training to re-learn.It is not clear that they could ever really operate at thedeliberative or global level for general purpose navigationtasks as the underlying model complexity is quite large. Noneof the methods described operate as per Figure 1c. This paperis concerned with the closed-loop consequences of traditionalend-to-end ML pipelines while exploring means to achievea fused system with ML for the local/reactive planner and amodel-based approach for the global/deliberative planner.As foundational elements for our local/global fusion, weutilize traditional methods that work at the local and globallevels, such as dynamic window approach (DWA) [23], elas-tic bands (EB) [24], and timed elastic bands (TEB) [25], andtheir existing ROS implementations. Thus, the investigationdoes not seek to propose new planning strategies, nor newlearning architectures, but rather it seeks to investigate howto effectively integrate ML-based approaches into existingnavigation pipelines. The remainder of this paper coversthe deep learning strategy, the planning integration, and testresults from simulation and real-world operation.II. D
EEP L EARNING
The AlexNet [26] deep convolutional neural networkserves as the foundation for our model. To facilitate learn-ing from few training samples, pre-training of the networkinvolves using the ImageNet dataset [27] and its associatedtraining task. We replace the final layer of the network witha shallow, single-layer learner (described in II-C). Severallearning tasks are explored, including the traditional ap-proaches from the literature: control action regression (con-tinuous output), and control action classification (discreteoutput). Prior to explaining the different ouput spaces, wefirst describe the common aspects of the training methodol-ogy.
A. Generating Data Samples
As is rapidly becoming the standard, we train with sim-ulated data [2], [3], [8], [21]. We use the Gazebo simulatorto model our robot and a depth sensor in various scenarios.
Capturing scenes:
Three (roughly cylindrical) trash cans arerandomly (uniformly) placed in a rectangular region in frontof the robot. The region extends from 1 to 5 meters in front ofthe robot and 3 meters to either side. Not all of the region canbe seen by the depth sensor, resulting in some scenarios withig. 2: Training Sample: Depth image with non-collidingtrajectory poses overlaid in green.fewer than 3 trash cans in them. A depth image is capturedby the depth sensor then decimated by 4, resulting in imagesamples with dimensions 140x120.
Generating Trajectories:
Using the trajectory generationand collision-checking approaches of [28], we evenly sample51 robot departure angles in the range [ − . , . radians. Itinvolves converting the departure angle into a feasible trajec-tory, generating a sequence of poses along the trajectory, thenchecking each pose for collisions. The non-colliding posesper departure angle reveal how far the robot can safely travelfor each departure angle (see Figure 2). This approach willimplicitly encode the geometry of the robot relative to thegeometry of the world, as the collision checking employs asolid model of the robot. A robot with different geometry,motion kinematics/dynamics, and dynamic constraints wouldneed this training phase to incorporate those models. Ourmodel was the TurtleBot 2 mobile robot whose constraintsare obtained from the manufacturer’s datasheet. Retaining Final Pose:
We retain the last non-collidingpose of each trajectory tested. The pose information is laterconverted into the appropriate output signal based on thedesired ouput space. The task metric employed for a givenlearning scenario dictates how to use the stored final poseinformation.
Generating Dataset Statistics:
We calculate the the per-pixel means and standard deviations for the dataset and savethem with the rest of the data. The statistics are used tonormalize samples for training and inference.
B. Learning the Model
Implement and training of the model is done with Tensor-flow; in particular, using the Adam optimizer with a learningrate of .001. Training is performed with a batch size of 600on an NVidia 1070GTX GPU. Training images are uniformlysampled from dataset. After each epoch, the model is runagainst a separate testing set. When a sample is selectedeither for training or for testing, the image is normalizedand resized to match the expected dimensions of AlexNnet (227x227). The result is then copied across 3 channels (theoriginal network was intended for RGB images).For classification-based output spaces, the loss function iscross entropy with logits. For regression-based output spaces,the loss function is the L -norm. Training ends when the losslevels off; this occurs within 500 iterations ( ∼
35 epochs).
C. Input/Output Mappings
The deep network model is intended to inform the tra-jectory selection or search process by using depth images asinputs. In this work, we use the same family of trajectories asin our prior work [28]. These trajectories are parameterizedby the initial robot velocity and a departure angle. As asimplification, we assume a constant forward velocity. Sincethese trajectories only involve strong rotations for briefperiods, it is a reasonable assumption.We implement three versions with different output spaces.The first two represent implementations reported in theliterature. In all of the versions, the intermediate labelrepresentation is the Euclidean distance from the body frameorigin to the end pose of each departure angle (relative tothe body frame).
1) Regression:
In this version, the last layer of theAlexNet model is replaced with a 1024x1 fully connectedlayer with no ReLU. The output of this model is a departureangle. The correct output is the departure angle correspond-ing to the greatest distance. In one version, the regressor wasgoal-agnostic, in another it is goal-informed.
2) Best Angle Classifier:
In this version, the last layerof the AlexNet model is replaced with a 1024x51 fullyconnected layer with ReLU. The 51 output values representthe confidences of the corresponding departure angle beingthe correct one. The winning class is the departure angle withthe greatest distance. In this strategy, there is competitionamongst the classifiers.
3) Collision-free Angles Classifier:
In this version, thelast layer of the AlexNet model is replaced with a 1024x102fully connected layer with ReLU, with the output reshapedto 51x2. After applying softmax to the last dimension, each1x2 pair represents the positive and negative confidences ofa departure angle being non-colliding. A departure angle isclassifed as non-colliding if it’s distance is greater than orequal to 4.0m, the approximate effective range of a Kinect.In contrast to the
Best Angle Classifier , this version yields afamily of independent (non-competing) 51 binary classifiers.The last classifier is motivated by our experiences withthe first three models trained (the two regression and bestangles classifier), which did not have strong performance(see Section IV). Evaluating trajectories for collisions iscomputationally costly, therefore we seek to reduce thenumber of trajectories that we need to evaluate in orderto find a satisfactory one. The purpose of the model isto learn a mapping between depth images and departureangles. The model should not dictate which departure angleto use, however, as that would leave no room for higher levelplanning systems to operate. Rather, it should indicate whicheparture angles are likely to yield good (i.e. non-colliding)trajectories. III. P
LANNING F RAMEWORK
At every local path-planning invocation, the depth imageis provided to the trajectory generator to generate an out-put. For the ML-navigation strategies that lead to a singleoutcome, the high-level planner plays little to no role. Forall other strategies, the samples get passed to a trajectoryevaluator that then selects the locally optimal choice. Weincorporate Dynamic Window Approach, Elastic Band, andTimed Elastic Band local planner options within a globalplanning framework using the ROS move base package.These planners served as the baseline planner approaches forthe comparison with ML-integrated global/local planners.The ROS path planning and navigation stack supportsseveral cost functions to guide the navigation. With respect tothe current position and orientation of the robot, the scoringfunctions included in our navigation objective function arethe following:1) Goal Heading: robot directed relative to goal direction;2) Path Heading: path keeps the robot nose on the nosepath;
3) Path Distance: path proximity to to global path;4) Goal Distance: path leads robot closer to goal; and5) Obstacle Cost: path avoids obstacles.They score the sampled trajectory according to the pref-erences stated above. The first two score the orientationcomponent of the path, the second two score the position,and the last embodies obstacle avoidance. Once the trajectorythat the robot should follow is selected, the trajectory is thenconverted into velocity commands and executed in the low-level trajectory controller. The navigation system follows thepath until the next planning stage. For laser scan-based localplanners that store obstacles in a Cartesian occupancy grid,re-planning is triggered every 1 second. The laser scan data issimulated from a single horizontal line from the depth image.Cartesian grid planners with this laser scan data will inheritany field-of-view limitations associated to the depth image.The perception space local planners [28] re-plan when therobot achieves 60% of the planned trajectory from the lastplanning stage or if it detects a collision along the predictedpath (occurs with dynamic obstacles).Unlike traditional learning based approaches that are usu-ally trained without awareness of the goal state, our systemimplemented two additional methods on the top of trajectoryselection algorithm to find both locally and globally optimalcandidates. One is a “to-goal” approach where the departureangle that points most directly to the goal is added in additionto the top k departure angles from the model. Another is“Gaussian-based goal bias” approach which biases trajectoryselection based on the relative position of the goal. It weightsthe raw classificaton confidence scores by a Gaussian prior The “nose” is a virtual point ahead of the robot that exploits differentialflatness to simplify the feedback control signal. The planned path also hassuch a point. It implicitly encodes the path tangent. (a)(b) (c)
Fig. 3: Sample scenario with a barrel straight ahead. (a)Depth image shown with color coded rays indicating theclassifier confidence of each angle. Red denotes highest con-fidence and blue lowest. (b) Depiction of goal influence onconfidence. The longer vectors are the original confidences,while the shorter vectors of the inner wedge of rays has thegoal-influenced confidences. For a straight ahead goal theleft and right splits are equally high in the directions thatminimize path length. (c) Depicts the same for a goal to theleft. Now the confidences are biased towards the goal andthere is no left/right split for the goal-influenced confidences.centered on the departure angle that would point directlyto the goal. In Figure 3, long arrows represent unbiasedtrajectory candidates while the shorter arrows indicate theweighted values.IV. E
XPERIMENTAL E VALUATION
To evaluate the quality of the overall ML-enhanced navi-gation pipelines, we run the vision-based navigation systemin Gazebo-simulated environments and on an actual mobilerobot (Turtlebot). Simulation serves to create highly repeat-able and consistent test conditions for evaluating a variety ofplanning implementations.In what follows, we do not report the results for theregression-based approaches. Both the goal-agnostic and thegoal-informed regression approaches perform poorly and cannot navigate without collision, in contrast to [10] who reportsuccess at a regression approach, as well as [3] whosemethod is goal-informed. We believe differences in trainingets may be responsible for the differences in performance.Our training scenes involve randomly placed obstacles in thefield of view and simulate only one instance in time, whereasthose from [10] appear to be mostly corridor images thatarise from actually executing a path. Our learnt model wouldhave no notion of a corridor in its internal representation.The model in [3] is trained in a sparser environment withlarge obstacles using navigation commands from a globalplanner. Topologically distinct trajectory splits would not beas frequently observed in those models.
A. Barrel Forest
The first battery of navigation tests in a simulated en-vironment involves positioning barrels in random locationswithin a 10x6 m world where the robot’s goal is to travel 8meters forward. the number of barrels randomly spawned isset to 3, 5, and 7 across three sets of trials. For each quantityof barrels, 50 scenes are created. For every instantiatedscene, we initialize the navigation pipeline with each of thefollowing local planners:1) Dynamic Window Approach2) Elastic Band3) Timed Elastic Band4) Learning PiPS + To-Goal5) Learning PiPS + Goal-Biased6) Learning Cartesian + To-Goal7) Learning Cartesian + Goal-biased8) Learning Naive + Goal-Biasedwhere PiPS is the perception space planner [28]. The lastplanner tested is the traditional end-to-end planner with adiscrete steering space and a winner-takes-all multi-classclassifier (sometimes called the “naive” planner as it repre-sents a typical first pass solution to the end-to-end learningproblem, after regression). However, it also includes goalbiasing. The objective of each task is to simply move fromone end of the world to the other end without crashing.The first three approaches described above use tradi-tional exhaustive trajectory search, creating 200 trajectorycandidates at each replanning phase. The learning-basedapproaches generate trajectories using the five departureangles with the highest confidences. For the Naive approach,the trajectory with the highest confidence is executed withoutscene analysis (i.e. no collision checking). In order to testthe original quality of the trajectory selection strategies,we disable the recovery behavior performed by the globalplanner when the local planner reports being stuck.
1) Controller Comparison:
Figure 4 graphs empiricalresults for the Barrel forest navigation task. Recall thatperformance should not be perfect due to (1) disabling ofthe local-minima recovery behavior and (2) field-of-viewlimitations that mean the robot might turn into an unseenobject. In this experiment, the TEB local planner consistentlyhas the highest success rate while the Naive controller has thelowest. However, the experiment shows that path-planningusing only five ML-predicted trajectories per planning stagecan achieve a success rate close to traditional exhaustive Fig. 4: Barrel Forest Success RateTABLE I: Success Rate vs Trajectory Candidate Quantities trajectory search-based planners. The ML methods sampleonly as many trajectories as the exhaustive searchers.The ML Cartesian planner with Gaussian goal biasingperforms only 2% worse than TEB and outperforms EBand DWA. Learning PiPS with Gaussian goal biasing out-performed DWA. The difference in performance is that theCartesian planners incorporate some memory in the localcost-map, whereas PiPS has no memory. It runs a higher riskof turning back into objects that leave the field of view whenbeing passed. Adding memory should improve performance.
2) Trajectory number comparison:
Next, we test theimpact of varying the number of trajectories generated bythe ML planners. The number of barrels is fixed to 5 whilethe number of trajectories sampled is set to , , , acrossfour trial sets. There are 50 trial scenes with randomlyspawned barrels. To increase the difficulty of the scenario,the barrel spawn zone is made smaller, thereby increasing thedensity of the “barrel forest.” We use the Learning Cartesianplanner with Gaussian Goal-biasing since it has the the bestperformance of the ML approaches we are testing. As in theprevious experiment, we disable recovery behaviors. FromTable I, the experimental results show that the number oftrajectory candidates provided by the learning-based modeldoes not have a significant impact on sucess rate as long asmore than three candidates are provided into the evaluation.We examine the failure cases to determine their sources.In some of these cases, the classifier failed to return validtrajectories when an exhaustive search would have suc-ceeded. These cases could be resolved by one of followingapproaches: switching to an exhaustive search when theig. 5: Sparse (left) and dense (right) sector worlds.ML approach fails, or replanning the global path using theinformation gathered from local planners.Exhaustive searching should be avoided if possible, butcan always be tried as a fallback approach. In order to testthe second approach, we rerun the 11 failure cases fromthe scenario with 7 candidate trajectories with the recoverybehavior enabled. In 6 out of 11 times, a new global planis found that successfully reaches the goal. The remaining5 cases fail due to the recovery behavior being tailored tolaser scan models. The robot turns 360 degrees and returnsto face the same scene. A more standard laser scanner wouldmost likely provide better information than the depth limitedkinect scanner. One option may be to create a new recoverybehavior for visual navigation strategies. B. Sector World
We also test our approach in environments that are dividedinto sectors and populated with an assortment of objects. Thetwo test environments are depicted in Figure 5, one sparselypopulated and one more densely populated with more diverseobject categories (recall that only garbage cans were in thetraining set). The object positioning remains static while thestart and goal sectors are randomly selected for each test.The objective of this experiment is to navigate from the startzone and travel to the goal sector without crashing. In thisexperiment we compare the best PiPS approach (LearningPiPS + To Goal) to the best traditional approach (TEB). Forboth worlds, we test 35 navigation sector-pairs, again withthe recovery behavior disabled. The results are in Table II.The ML-based trajectory generation strategy outperformsthe exhaustive trajectory TEB local planner for the easysector world. In denser environment, the success rate de-creased 23% for the ML planner and increased about 3.5%for the TEB planner. The less than perfect completion rate forboth approaches reinforces the well-accepted observation thatlocal planners need augmentation by global planners withrecovery behaviors, as well as the need to consider the fullnavigation pipeline when creating and evaluating ML-basedimage-to-decision navigation methods.
C. Real World Implementation
Lastly, we implemented our ML-based planner module ona Turtlebot 2 equipped with a laptop. Given a goal and a known starting position, the robot is be able to navigatethrough obstacles with only 5 ML-generated trajectory can-didates. In 13 of the 15 tests, navigation from start to goalis successful. For 2 of the 15 cases, the Turtlebot collideswith objects located just outside the field of view. Figure 6consists of overhead views of one navigation scenario. TheTurtlebot starts off screen at the upper-right (just past thebarely visible white angle marker) and is directed to travelto another off-screen location at the lower-left of the image,offscreen. The start and final positions are chosen such thatfollowing the straight line path results in collisions. In thelast image one can see how closely the PiPS planner tendsto get to obstacles when passing them.V. C
ONCLUSION
This paper explores the behavior of typical end-to-end ma-chine learning based navigation strategies under controlledsimulation conditions, as well as proposes a revised trainingand trajectory generation method based on the outcomesof the strategies. While demonstrating success at collision-free wandering in natural, real-world environments, the tra-ditional ML algorithms perform poorly when faced with agoal-directed navigation task. By modifying the input/ouputmapping, adjusting the training data provided and its evalua-tion criteria, and incorporating well established goal-directedmodifications to the planning pipeline, a deep learning basedmodification to existing navigation stacks is described. Incontrolled conditions, the proposed deep learning navigationstrategy is shown to operate as effectively as commonly usedmethods in the robotics community. Furthermore, the numberof trajectories needed to find a path is significantly reduced.Further work is needed to resolve the issues that arise fromthe limited field of view associated to dense visual sensors.One solution would be to incorporate additional memorywithin the local planner, whereas another would be to modifythe model and training procedure to consider the time historyof decisions in order to implicitly learn to create a bufferaway from obstacles when passing them.R
EFERENCES[1] S. Ross, N. Melik-Barkhudarov, K. Shankar, A. Wendel, D. Dey,J. Bagnell, and M. Hebert, “Learning monocular reactive uav controlin cluttered natural environments,” in
Proceedings of IEEE Interna-tional Conference on Robotics and Automation , 2013, pp. 1765–1772,karlsruhe, Germany.[2] F. Sadeghi and S. Levine, “Cad rl: Real single-image flight without asingle real image,” in Proceedings of Robotics: Science and Systems ,2017.[3] M. Pfeiffer, M. Schaeuble, J. Nieto, R. Siegwart, and C. Cadena,“From perception to decision: A data-driven approach to end-to-endmotion planning for autonomous ground robots,” in
Proceedings ofIEEE International Conference on Robotics and Automation , Singa-pore, 2017, pp. 1527–1533.[4] T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning deep con-trol policies for autonomous aerial vehicles with mpc-guided policysearch,” in
Proceedings of IEEE International Conference on Roboticsand Automation , 2016, pp. 528–535, stockholm, Sweden.[5] S. M. LaValle,
Planning algorithms . Cambridge university press,2006.[6] K. Bipin, V. Duggal, and K. M. Krishna, “Autonomous navigation ofgeneric monocular quadcopter in natural environment,” in
Proceedingsof IEEE International Conference on Robotics and Automation , 2015,pp. 1063–1070.
ABLE II: Experimental results for Sector World.
World A (Sparse) World B (Dense)Success Rate (%) Avg. Time (s) Success Rate (%) Avg. Time (s)LPIPS-ToGoal 93.33 57.55 70 66.02TEB 86.56 55.8 90 63.41
Fig. 6: Overhead snapshots of Turtlebot navigating an obstacle course (chronologically: from top left to top right, thenbottom left to bottom right). [7] K. Lamers, S. Tijmons, C. De Wagter, and G. de Croon, “Self-supervised monocular distance learning on a lightweight micro airvehicle,” in
Proceedings of the IEEE International Conference onIntelligent Robotic and Systems , Daejeon, South Korea, 2016, pp.1779–1784.[8] J. Michels, A. Saxena, and A. Ng, “High speed obstacle avoidanceusing monocular vision and reinforcement learning,” in
Proceedingsof International Conference on Machine Learning , 2005.[9] H. van Hoof, N. Chen, M. Kar, P. van der Smagt, and J. Peters, “Stablereinforcement learning with autoencoders for tactile and visual data,”in
Proceedings of the IEEE International Conference on IntelligentRobotic and Systems , Daejeon, South Korea, 2016, pp. 3928–3934.[10] L. Tai, S. Li, and M. Liu, “Autonomous exploration of mobile robotsthrough deep neural networks,”
International Journal of AdvancedRobotic Systems , vol. 14, no. 4, 2017.[11] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “DeepDriving: Learningaffordance for direct perception in autonomous driving,” in
IEEEInternational Conference on Computer Vision , Santiago, Chile, 2015,pp. 2722–2730.[12] A. Giusti, J. Guzzi, D. C. Cires¸an, F. L. He, J. P. Rodr´ıguez, F. Fontana,M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza,and L. M. Gambardella, “A machine learning approach to visualperception of forest trails for mobile robots,”
IEEE Robotics andAutomation Letters , vol. 1, no. 2, pp. 661–667, 2016.[13] S. Daftry, S. Zeng, J. A. Bagnell, and M. Hebert, “Introspective percep-tion: Learning to predict failures in vision systems,” in
Proceedings ofthe IEEE International Conference on Intelligent Robotic and Systems ,2016, pp. 1743–1750.[14] D. M. Saxena, V. Kurtz, and M. Hebert, “Learning robust failureresponse for autonomous vision based flight,” in
Proceedings of IEEEInternational Conference on Robotics and Automation , Singapore,2017, pp. 5824–5829.[15] C. Richter and N. Roy, “Safe visual navigation via deep learning andnovelty detection,” in
Proceedings of Robotics: Science and Systems ,2017.[16] X. Guo, S. Singh, H. Lee, R. Lewis, and X. Wang, “Deep learningfor real-time atari game play using offline Monte-Carlo tree searchplanning,” in
Advances in Neural and Information Processing Systems ,2014.[17] M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaskowski,“ViZDoom: A Doom-based AI research platform for visual reinforce-ment learning,” in
IEEE Conference on Computational Intelligence and Games , Santorini, Greece, 2016.[18] Y. Liang, M. Machado, E. Talvitie, and M. Bowling, “State of the artcontrol of atari games using shallow reinforcement learning,” in
Proc.of International Conference on Autonomous Agents & MultiagentSystems , Singapore, Singapore, 2016, pp. 485–493. [Online].Available: http://dl.acm.org/citation.cfm?id=2936924.2936996[19] R. Mottaghi, H. Bagherinezhad, M. Rastegari, and A. Farhadi, “New-tonian image understanding: Unfolding the dynamics of objects instatic images,” in
Proceedings IEEE Conference on Vision and PatternRecognition , Las Vegas, NV, 2016, pp. 3521–3529.[20] J. Wu, I. Yildirim, J. Lim, W. Freeman, and J. Tenenbaum, “Galileo:Perceiving physical object properties by integrating a physics enginewith deep learning,” in
Advances in Neural and Information Process-ing Systems , 2015.[21] P. Mirowski, R. Pascanu, F. Viola, H. Soyer, A. Ballard, A. Banino,M. Denil, R. Goroshin, L. Sifre, K. Kavukcuoglu, D. Kumaran,and R. Hadsell, “Learning to navigate in complex environments,” in
Proceedings of International Conference on Learning Representations ,Toulon, France, 2017.[22] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, andA. Farhadi, “Target-driven visual navigation in indoor scenes usingdeep reinforcement learning,” in
Proceedings of IEEE InternationalConference on Robotics and Automation , 2017, pp. 3357–3364.[23] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approachto collision avoidance,”
Robotics Automation Magazine, IEEE , vol. 4,no. 1, pp. 23–33, Mar 1997.[24] S. Quinlan and O. Khatib, “Elastic bands: connecting path planningand control,” in [1993] Proceedings IEEE International Conferenceon Robotics and Automation , May 1993, pp. 802–807 vol.2.[25] C. R¨osmann, W. Feiten, T. W¨osch, F. Hoffmann, and T. Bertram,“Efficient trajectory optimization using a sparse model,” in , Sept 2013, pp. 138–143.[26] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in
Advances in neuralinformation processing systems , 2012, pp. 1097–1105.[27] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,“Imagenet: A large-scale hierarchical image database,” in
ComputerVision and Pattern Recognition, 2009. CVPR 2009. IEEE Conferenceon . IEEE, 2009, pp. 248–255.[28] J. S. Smith and P. Vela, “Pips: Planning in perception space,” in