Learning to Predict Ego-Vehicle Poses for Sampling-Based Nonholonomic Motion Planning
Holger Banzhaf, Paul Sanzenbacher, Ulrich Baumann, J. Marius Zöllner
LLearning to Predict Ego-Vehicle Poses for Sampling-BasedNonholonomic Motion Planning
Holger Banzhaf , Paul Sanzenbacher , Ulrich Baumann , J. Marius Z¨ollner This work is an extended version of [1] and therefore partially copyrighted by IEEE: c (cid:13)
Abstract — Sampling-based motion planning is an effectivetool to compute safe trajectories for automated vehicles incomplex environments. However, a fast convergence to theoptimal solution can only be ensured with the use of problem-specific sampling distributions. Due to the large variety ofdriving situations within the context of automated driving, itis very challenging to manually design such distributions. Thispaper introduces therefore a data-driven approach utilizing adeep convolutional neural network (CNN): Given the currentdriving situation, future ego-vehicle poses can be directlygenerated from the output of the CNN allowing to guidethe motion planner efficiently towards the optimal solution.A benchmark highlights that the CNN predicts future vehicleposes with a higher accuracy compared to uniform samplingand a state-of-the-art A*-based approach. Combining this CNN-guided sampling with the motion planner Bidirectional RRT*reduces the computation time by up to an order of magnitudeand yields a faster convergence to a lower cost as well as asuccess rate of
100 % in the tested scenarios.
I. I
NTRODUCTION
Motion planning is one of the major pillars in the softwarearchitecture of automated vehicles. Its task is to compute asafe trajectory from start to goal while taking into accountthe vehicle’s constraints, the non-convex surrounding as wellas the comfort requirements of passengers. In structuredenvironments, such as highway driving, it is sufficient tosolve the motion planning problem locally close to the lanecenterline. Semi-structured environments or complex maneu-vers, however, require a global solution as local approachestypically fail. Real-world examples of such scenarios areevasive maneuvers due to blocked roads (see Fig. 1), multi-point turns at dead ends, or various parking problems.Despite the complexity of such situations, the real-timeconstraints of automated vehicles still require a fast conver-gence to a preferably optimal solution. In order to achievethis, a common approach in the literature is the design ofhand-crafted heuristics that guide the motion planner towardsthe goal [2], [3]. However, finding a good heuristic that takesinto account the nonholonomic constraints of the vehiclewhile adapting to the various driving situations is nearly asdifficult as solving the motion planning problem itself.An appealing solution for this dilemma is the combinationof classic motion planning techniques [4] with machinelearning (ML) algorithms that learn to guide the motion Holger Banzhaf is with Robert Bosch GmbH, Corporate Research,Automated Driving, Renningen, Germany. Paul Sanzenbacher and Ulrich Baumann were with Robert BoschGmbH, Corporate Research, Automated Driving, Renningen, Germany. J. Marius Z¨ollner is with FZI Research Center for Information Technol-ogy, Karlsruhe, Germany. Fig. 1: Evasive maneuver due to an accident blocking the road. The pathof the ego-vehicle (green line) is computed using the learned vehicle posepredictions (gray arrows). The obstacles in the environment are visualizedby the red voxels, and the corresponding two-dimensional cost map formotion planning is depicted on the ground. planner in a data-driven way. In such a tandem configuration,for instance, the ML algorithm generates potential solutionsfor the planning problem while the motion planner selectsthe best proposals and guarantees collision avoidance.This paper introduces and evaluates such a combination,in which a neural network guides the sampling-based motionplanner Bidirectional RRT* (BiRRT*) [5]. Sampling-basedrather than search-based planning [2], [6] is considered hereas the latter requires expert knowledge to discretize the state-action space. Especially in tight environments, an improperchoice of the discretization resolution directly influences thecompleteness of the search-based planner. Sampling-basedmotion planners, on the other hand, require a problem-specific sampling distribution to speed up the planningprocess. Within this context, the main contributions are: • Prediction of future ego-vehicle poses given the currentenvironment as well as the start and goal state using adeep convolutional neural network (CNN). The networkis trained end-to-end using trajectories and observationsrecorded in a real-world driving simulator. • Comparison of the proposed method with two baselines:uniform sampling and an A*-based approach. • Evaluation of the different sampling distributions alongwith the generic motion planner BiRRT* in three chal-lenging automated driving scenarios.The remainder of this paper is organized as follows: Sec. IIdescribes the related work, and Sec. III focuses on the neuralnetwork used for the prediction task. The performance ofthe CNN-guided motion planner is highlighted in Sec. IV,followed by a conclusion in Sec. V. A supplementary videocan be found at https://youtu.be/FZPn3OHdQxk . a r X i v : . [ c s . R O ] F e b I. R
ELATED W ORK
Recent advances in deep learning have opened up new pos-sibilities to improve or even replace existing motion planningalgorithms for robot navigation. For instance, impressiveresults have been achieved in the field of imitation learning,where a robot is trained to act based on expert demonstra-tions, such as end-to-end learning for self-driving cars. Here,deep neural networks have shown the capability to learnreactive policies given high-dimensional sensor inputs [7],[8]. However, these approaches cannot guarantee safety andmight fail in situations where a global solution of the motionplanning problem is required.The shortcomings of pure ML-based solutions can beavoided by replacing only non-safety-relevant parts of ex-isting motion planners with ML, and hence maintain crucialproperties like safety and optimality. Depending on the plan-ning algorithm, different approaches exist in the literature.Optimization-based planners, for example, often suffer fromshort planning horizons. This issue can be resolved with alearned cost shaping term that aligns the short-term horizonwith the long-term goal as proposed in [9].In contrast to that, search-based planners in continuousstate-action space require an action sampling distribution forgraph expansion and a cost-to-go heuristic for node selection.The former is addressed in [10], where a generative adver-sarial network [11] is trained to guide the search towardsthe goal. Minimizing search effort through learned cost-to-goheuristics is achieved in [12]. However, iteratively evaluatingthe heuristic during search limits the capacity of the neuralnetwork due to real-time constraints. Furthermore, optimalitycan only be guaranteed in a multi-heuristic setup.Sampling-based motion planners, on the other hand, re-quire problem-specific sampling distributions for fast con-vergence to the optimal solution. A promising researchdirection is to learn these distributions using imitation learn-ing. For instance, [13] integrates end-to-end learning in asampling-based motion planner. A conditional variationalauto-encoder [14] is trained in [15] to learn a samplingdistribution of arbitrary dimension. While this approachseems generally promising, scaling it up to high-dimensionalenvironment models, such as occupancy grids, remains anunresolved challenge. Opposed to that, [16] uses a CNN topredict future positions of the robot given a start and goalposition as well as the current environment. The CNN’s out-put is then used to bias the sampling of RRT* [17]. A similaridea is presented in this paper with the major difference thatnot only position, but entire pose predictions (position andorientation) are learned for nonholonomic motion planning.Recent publications [18], [19], [20] have shown novel mo-tion planning algorithms that conduct planning in a learnedlatent space rather than in the complex configuration space.The general idea is to simplify the planning problem bysolving it in the lower-dimensional latent space. While still inan early development phase, the future of these approaches insafety-critical applications highly depends on the possibilityto satisfy hard constraints like collision avoidance. III. L
EARNING E GO -V EHICLE P OSE P REDICTIONS
This section introduces the model for the prediction ofthe ego-vehicle poses connecting a start with a goal statein an environmental aware fashion. Sections III-A to III-Chighlight the data generation process, the model, and thesampling of vehicle poses from the model’s output. The train-ing process, hyperparameter optimization, and evaluation arefinally described in Sections III-D to III-F.
A. Data Generation
Learning ego-vehicle predictions in a supervised fashionrequires a diverse dataset consisting of the vehicle’s trajec-tory from start to goal and the corresponding observationsof the environment. Such a dataset with a total number of
13 418 trajectories is recorded in a real-world simulator usingGazebo and ROS. The implemented data generation processis fully automated and parallelized with multiple simulationinstances, and requires no time-consuming manual labelingof the recorded training data.Each recording is generated in one of the eleven scenariosvisualized in Fig. 2. As this research is embedded in a projectwith a focus on low-speed maneuvering in tight environ-ments, the designed scenarios focus on challenging setupsthat require the vehicle to act precisely in a highly non-convex workspace. Exemplary situations include blockedroads, dead ends, or different parking problems. Variationis introduced in each scenario by changing the start and goalpose of the ego-vehicle as well as the number and locationof the static obstacles. For unstructured environments, this isimplemented by randomly sampling the obstacles and the ve-hicle’s start and goal pose. For semi-structured environments,obstacles are randomly assigned to predefined locations, suchas parking spots. The ego-vehicle’s start and goal pose are,in this case, randomly chosen within predefined locationscorresponding to the entrance and exit of a scenario, thedriveway, or a parking spot.The following procedure is then applied to obtain arecording in the randomly configured instances of eachscenario. First, the motion planner BiRRT* generates acurvature-continuous collision-free path from start to goalas detailed in Sec. IV. Here, BiRRT* is guided by theA*-based heuristic described in Sec. III-F, and its initialmotion plan is optimized for . Next, a motion controllerexecutes the computed solution resulting in a trajectorywith t = 1 , . . . , T vehicle states x t ∈ X , where a state x t = ( x t , y t , θ t , κ t , v t ) (cid:62) at time step t is defined by thevehicle’s position ( x t , y t ) , orientation θ t , curvature κ t , andvelocity v t . Note that throughout this paper, the term vehiclepose is only used to describe a lower-dimensional subsetof the state including the vehicle’s position and orientation.Finally, the observations from a simulated LiDAR perceptionare fused in a two-dimensional
60 m ×
60 m occupancy gridwith a resolution of
10 cm and recorded along the trajectory.A visualization of such a recording can be found in Fig. 4on the left.It has to be noted that, similar to [12], motion planningfor data generation is conducted with full environmental ig. 2: Visualization of the eleven scenarios used in the generation of the dataset. Each scenario highlights a challenging driving task in the context oflow-speed maneuvering in cluttered environments. First row: arena with a maze-like structure, blocked T-intersection, urban parking environment, andcircular parking lot. Second row: dead end, blocked road, and cluttered roundabout. Third row: various parking scenarios, where the angle between thedriveway and the parking spot is varied between ◦ , ◦ , ◦ , and ◦ . information in order to guarantee fast convergence to acost-minimizing solution. In contrast to that, the recordedoccupancy grid only fuses the current history of sensorobservations resulting in unobserved areas due to occlusions.This forces the model in the learning phase to also resolvesituations with partial observability that requires an intuitionwhere possible solutions might lie. B. Model
The model is designed to generate a sampling distributionover future vehicle poses connecting a start and goal stategiven an occupancy grid with environmental observations.Previous publications [16], [21], [22] have shown that CNNsare well suited for processing high dimensional inputs andpredicting multi-modal distributions over future ego-vehiclelocations. For nonholonomic motion planning, however, itis essential to enrich such a spatial distribution with theinformation about the robot’s heading angle at all predictedpositions. Therefore, it is proposed to jointly learn a sam- pling distribution over future vehicle positions as well asa mapping from position to heading angle for a completerepresentation of a vehicle pose.This task is realized with the encoder-decoder architec-ture shown in Fig. 3. The illustrated CNN, which containsabout . million parameters, is based on the well-knownSegNet [23] from semantic segmentation. The main contri-bution is not the architecture of the model itself, as it can beeasily exchanged with any other state-of-the-art architecture,but the representation of the input and output layers.The proposed network takes five grids with a resolution of
256 px ×
256 px as an input. These grids encode the staticobstacles, the unknown environment, the past path, and thestart and goal state of the vehicle (see Fig. 4). A simpleand effective encoding of the vehicle state is achieved by a × square that describes three information: (1) itslocation in the grid marks the position of the vehicle, (2) itsinner pixels encode the corresponding vehicle velocity, and(3) its outer pixels depict the respective heading angle. Note × ×
64 256 × × × × × ×
64 128 × ×
64 128 × ×
64 64 × ×
64 64 × ×
128 64 × ×
128 32 × ×
128 32 × ×
128 32 × ×
128 16 × ×
128 16 × ×
256 16 × ×
256 16 × ×
128 32 × ×
128 32 × ×
128 32 × ×
128 64 × ×
128 64 × ×
128 64 × × path predictionheading angle prediction × ×
64 128 × ×
64 128 × ×
64 256 × ×
64 256 × ×
64 256 × × encoder context decoderconv pool unpoolinput/output pooling indices past path, start, goalobstacles, unknown Fig. 3: Architecture of the CNN, which predicts a two-dimensional sampling distribution over future ego-vehicle positions (top right) and the correspondingmapping from position to heading angle (bottom right). Note that the five input girds, which describe the current driving situation, are blended into twoimages on the left.Fig. 4: Visualization of a perpendicular parking task on the left, and the derived input grids (a)–(e) and label grids (f)–(h) for training the CNN. Theego-vehicle is represented by the green arrow in the center of the image on the left and the target vehicle pose by the white arrow without a tail. The colorof the arrows indicates the vehicle’s velocity at the respective pose. that the curvature of the start state is implicitly encoded inthe past path as it can be seen in Fig. 3 at the top left.The input of the CNN is first encoded and then de-coded back to its original size using alternating blocks ofconvolutional layers with × kernels and a stride of , × max pooling layers, and × unpooling layers. Allconvolutional layers except the last one are followed by abatch normalization [24] and a ReLU activation [25].The CNN outputs four grids of the same resolution as theinput grids. The first two grids give the results of a cell-wise classification that is trained to predict whether a givencell belongs to the future path or not. An example of sucha prediction is shown in Fig. 3 at the top right, where theintensity corresponds to the probability p path ( c ) of a cell c being part of the future path. In contrast to that, the last twooutput grids contain the results of a cell-wise regression forthe sine and cosine components of the robot’s heading angle.The decomposition into sine and cosine has three advantages:(1) a cell with zeros in both components represents an invalidorientation and can therefore be used to label cells withoutangle information (see Fig. 4), (2) computing the cell-wisenorm of the predicted components can be interpreted as aconfidence measure p θ ( c ) indicating if an angle information is available at a respective cell c (see Fig. 3 at the bottomright), and (3) the prediction of the heading angle can betreated as a regression task rather than a classification task,which yields a compact representation of the output andavoids an exponential growth of its dimension. However, thelatter comes with a potential drawback of not being ableto predict multi-modal, cell-wise heading angle predictions.The experiments have shown, however, that this is only anissue in rare corner cases as most of the scenarios do notrequire the vehicle to change its heading angle significantlywhile staying in the same region of the environment. C. Vehicle Pose Sampling
Generating i = 1 , . . . , N continuous vehicle poses x [ i ] pred from the output of the CNN is conducted in four steps.First, N random cells c [ i ] pred are sampled from the CNN’spath prediction p path ( c ) , which can be seen as a probabilitymass function (pmf) that describes the relative probability ofa cell c being part of the future path (see Fig. 3). Samplingfrom the pmf can be realized efficiently using the low-variance sampling algorithm in [26]. Its linear time complex-ity yields a fast computation even for large values of N . Next,a continuous position prediction ( x [ i ] pred , y [ i ] pred ) is obtained bysampling uniformly within the previously computed discreteell c [ i ] pred . The heading angle prediction θ [ i ] pred is now evaluatedat the cell c [ i ] pred . If required, it can be interpolated withrespect to the continuous position ( x [ i ] pred , y [ i ] pred ) , which isomitted here for the sake of simplicity. In a final step, thesampled position and the corresponding heading angle areconcatenated yielding a sampled vehicle pose x [ i ] pred .Note that throughout this paper, samples are exclusivelydrawn from cells with p path ( c ) > . in order to only guidethe motion planner towards regions with a high probabilityto be part of the future path. In contrast to that, visualizationsof p path ( c ) are never thresholded and show the unmodifiedoutput of the CNN. D. Training and Metrics
The proposed model is trained end-to-end using
64 % of the recorded trajectories. The training dataset is furtheraugmented by randomly selecting up to different startstates on a recorded trajectory resulting in
807 273 (partlycorrelated) data points. This augmentation strategy does notonly upscale the dataset, but also forces the CNN to adjust itsprediction as more information on the vehicle’s environmentbecomes available.The parameters of the network are optimized usingAdam [27] and the loss function L = (cid:80) c ∈C ( f CE ( c ) L CE ( c ) + f MSE ( c ) L MSE ( c ))+ (cid:80) w ∈W λw , (1)where the first term computes the cell-wise cross-entropy loss L CE ( c ) of the classification task, the second term the cell-wise mean squared error L MSE ( c ) of the regression task, andthe last term the L2 regularization loss of the weights w ∈ W with a scaling factor λ . As the majority of cells in the labelgrids contain no path information (see Fig. 4), a weightingof the relevant cells is conducted with the functions f CE ( c ) and f MSE ( c ) given as f CE ( c ) = 1 + c · ( γ CE − , (2) f MSE ( c ) = 1 + c · ( γ MSE − , (3)where γ CE and γ MSE are hyperparameters of the model, and c is the indicator function that is if the future path crossesa cell c and otherwise.In order to evaluate the prediction capability of the model,two metrics are introduced below. The first metric measuresthe proximity of the N predicted samples x [ i ] pred ∈ X pred tothe ground truth trajectory x t ∈ X by computing the averagepath deviation D according to D ( X , X pred ) = (cid:80) Ni =1 min x t ∈X d ( x t , x [ i ] pred ) N , (4)where d ( • ) describes the distance between a pose on thetrajectory and a sample. It is computed by a weighted sumof the Euclidean distance and the angular deviation given as d ( x t , x [ i ] pred ) = w pos (cid:13)(cid:13)(cid:13) x [ i ] pred − x t (cid:13)(cid:13)(cid:13) + w θ (cid:12)(cid:12)(cid:12) θ [ i ] pred − θ t (cid:12)(cid:12)(cid:12) , (5)where the different units of both terms are taken into accountby setting w pos to . and w θ to . throughout this paper. As the predicted poses are supposed to guide the motionplanner through complex scenarios, it is beneficial to haveevenly distributed samples along the ground truth trajectory.Therefore, the second metric measures the maximum pre-diction gap G in the following two steps: (1) project everysample onto the trajectory, and (2) evaluate the maximumarc length that is not covered by any sample. The projectionin step 1 is defined as x [ i ] ref = arg min x t ∈X d ( x t , x [ i ] pred ) , (6)where x [ i ] ref is the reference pose of the predicted sample x [ i ] pred on the trajectory. These reference poses are then added tothe ordered list X ref according to their arc length s [ i ] ref . Themaximum prediction gap G ∈ [0 , is finally obtained by G ( X , X ref ) = max i =1: N − s [ i +1] ref − s [ i ] ref s T , (7)where the length of the recorded trajectory is denoted by s T . E. Hyperparameter Optimization
This subsection evaluates the influence of the hyperpa-rameters on the CNN’s prediction performance based onthe previously derived metrics. The analysis is conductedon a validation dataset containing
16 % of the recordedtrajectories. Similar to the training dataset, up to five differentstart states are selected on each trajectory resulting in
10 730 (partly correlated) data points.The different parametrizations of the model are trained onan Nvidia Titan X with a batch size of using the Pythoninterface of TensorFlow. Training a model to convergencetakes about
30 h , which corresponds to
90 000 iterations. Avisualization of the training statistics for one of the conductedoptimizations can be found in Fig. 5. iteration [-] . . . . . . . . L [ - ] . . . . . G [ - ], D [ - ] L (train) L (val) G (train) G (val) D (train) D (val) Fig. 5: Visualization of the training statistics with the hyperparameters setto γ CE = γ MSE = 25 and lr = 10 − (learning rate). Note that for bettercomparability, the loss L is visualized without the L2 regularization termas it is only computed at training and not at inference time. The metrics G and D are evaluated on sampled vehicle poses. In order to decrease the exponentially growing searcheffort, only the hyperparameters with the highest expectedinfluence on the CNN’s overall performance are optimized. a) lr = 10 − , γ CE = 10 , γ MSE = 100 (b) lr = 10 − , γ CE = 25 , γ MSE = 25 (c) lr = 10 − , γ CE = 100 , γ MSE = 10
Fig. 6: Qualitative comparison of the different hyperparameters on a test set trajectory with sampled vehicle poses. Therefore, two out of the five hyperparameters, namely theexponential learning rate decay lrd and the L2 regularizationscaling factor λ , are fixed to lrd = 0 . after iterationsand λ = 0 . . The remaining hyperparameters are opti-mized in two steps. First, the learning rate lr is varied whilekeeping γ CE and γ MSE at , and second, lr is set to its bestvalue while γ CE and γ MSE are optimized.The quantitative results of the hyperparameter optimiza-tion are listed in Tab. I. It can be seen that lower learn-
TABLE I: Quantitative comparison of the hyperparameter optimization onthe basis of sampled vehicle poses carried out on the validation dataset.The displayed values are given with mean and standard deviation. lr γ CE γ MSE G [ % ] D [ − ] −
25 25 20.7 ± ± − ± ± − ± ± −
10 10 12.6 ± ± ± ± ± ± ± ±
25 25 10.1 ± ±
25 100 10.8 ± ± ± ± ± ± ± ± ing rates result in a better prediction performance as theydecrease the maximum prediction gap metric G as well asthe average path deviation metric D . The insights from [21]that higher values for γ CE incentivize the CNN to classifymore cells as part of the future path can also be observed inTab. I and Fig. 6. Thus, increasing γ CE reduces the maximumprediction gap G while raising the average path deviation D because more samples are placed further away from theground truth. A potential risk of too large values for γ CE is a deterioration of the heading angle prediction, whichcannot be recovered by increasing γ MSE (see Tab. I). For theremaining evaluations, the model with γ CE = 25 , γ MSE = 25 ,and lr = 10 − is selected as this combination yields asmooth distribution of the samples in close vicinity of theground truth. F. Evaluation
The following paragraphs analyze the prediction capabilityof the CNN and benchmark its performance against uniformsampling and the A*-based approach called Orientation-Aware Space Exploration (OSE) [3]. To the best of theauthors’ knowledge, the latter is currently one of the mosteffective approaches for guiding a motion planner throughcomplex environments. It is based on an A* graph search andexplores the workspace with oriented circles as illustratedin Fig. 7. Discrete vehicle poses can then be obtained
Fig. 7: Orientation-Aware Space Exploration visualized by the white circleson the ground including sampled vehicle poses (gray arrows). by sampling from three-dimensional Gaussian distributionslocated at the center of the circles. The standard deviation ofthe Gaussians can be made dependent of the radius r and isset to σ x = σ y = r/ for the translational and σ θ = π/ forthe rotational component, respectively. Additional parametersof the OSE heuristic used in this paper are listed in Tab. II. TABLE II: Parameters of the Orientation-Aware Space Exploration.parameter valueminimum radius . maximum radius . minimum clearance .
041 m neighbors maximum curvature . − computation timeout . A benchmark of the three approaches on the test datasetcan be found in Tab. III. The
13 420 test cases with up to ig. 8: Illustration of the CNN prediction including sampled vehicle poses in different scenarios. The top row visualizes the performance on three test settrajectories recorded in the scenarios circular parking lot (left), parallel parking (middle), and arena (right). The bottom row displays the robustness of theapproach in the novel scenarios construction zone (left) and cluttered 4-way intersection (middle). Both scenarios contain novel artifacts like constructionbarrels and dumpsters. The image on the bottom right shows a performance degeneration for a long prediction horizon in an unstructured environment. five different start states on each trajectory represent theremaining
20 % of the recordings. For a scenario-specificevaluation, trajectories are extracted from this dataset,half of which come from the scenario arena and the otherhalf from parallel and perpendicular parking. Both uniformsampling as well as the OSE procedure are evaluated on asingle core of an Intel Xeon E5@ . with the latterbeing implemented in C++. As opposed to that, the CNN isexecuted with its Python pipeline on an Nvidia Titan X andan Intel Xeon E5@ . .In comparison to uniform sampling and the OSE approach,Tab. III shows that the CNN predicts the vehicle poses moreevenly along the ground truth path and yields overall smallerdeviation from it. The mean computation time of the CNNis comparable to the OSE heuristic, however, with a one-third smaller standard deviation. The OSE’s high variance incomputation time is due to the fact that the performanceof graph search-based algorithms highly depends on thecomplexity of the environment. This also causes the problemthat a solution might not be found before the timeout isreached, which occurred here in . of all test cases. Insummary, the CNN, whose output is qualitatively visualizedin Fig. 8, makes its predictions more reliably (no outages)with a lower latency. Both aspects are key features in safety- TABLE III: Benchmark of uniform sampling, the OSE approach, and theCNN prediction on the test dataset. The metrics G and D as well as thecomputation time are evaluated on sampled vehicle poses and are givenwith mean and standard deviation.scenarios G [ % ] D [ − ] time [ ms ]uniform all 13.5 ± ± ± ± ± ± all 10.4 ± ± ± all 11.8 ± ± ± all 11.9 ± ± ± all 17.8 ± ± ± all 11.4 ± ± ± all 12.0 ± ± ± all 32.6 ± ± ± arena 14.4 ± ± ± parking 9.1 ± ± ± all input grids all but the obstacle grid all but the unknown grid all but the obstacle & unknown grid all but the past path grid all but the start grid all but the goal grid critical applications such as automated driving.In order to better understand the effect of the differentinput grids on the performance of the CNN, an ablationstudy has been conducted. The results in Tab. III show thatremoving features from the CNN’s input causes a deterio-ration of at least one of the analyzed metrics. Furthermore,it can be seen that excluding the goal or the observations ig. 9: Guided motion planning in three automated driving scenarios. Exemplary solutions including CNN-predicted vehicle poses (gray arrows) areshown in the bottom row. Scenario I (left) illustrates a blocked intersection, where the red ego-vehicle has to execute a multi-point turn. A narrow passageproblem due to a blocked road can be seen in Scenario II (middle) while a high density parking environment [28] is displayed in Scenario III (right). causes the greatest decline in performance. In contrast tothat, the metrics only change slightly if the CNN is trainedand evaluated without the obstacle or the unknown grid (notboth). One of the reasons for this is that in many cases, bothgrids are complementary in the sense that the unknown gridallows to infer the obstacles and vice versa (see Fig. 4).The scenario-specific benchmark in Tab. III highlights thatthe CNN’s performance in the parking scenario is almost afactor two better compared to the maze-like structure arena.The resulting insight is that learning stationary samplingdistributions in completely unstructured environments is amuch harder task for the network than in semi-structuredenvironments. This can also be seen in Fig. 8 on thebottom right, where the CNN prediction degenerates dueto the complexity of the scenario. Possible reasons for thisare longer prediction horizons, a larger variety of feasiblemaneuvers, and potentially heavier occlusions. It is left forfuture work to determine which network structure is the mostsuitable one for such challenging environments.A visualization of the CNN prediction in two novelscenarios, namely a construction zone and and a cluttered4-way intersection, can be found in the lower left imagesof Fig. 8. Both scenarios were not seen during trainingand contain novel artifacts such as construction barrels andrectangular dumpsters. While the construction zone onlyrequires the vehicle to follow the lane, the maneuver at thecluttered 4-way intersection consists of three steps: (1) exitthe tight parking spot, (2) avoid the dumpster at the sideof the road, and (3) make a turn at the intersection. Theillustrated predictions showcase the models capability to dealwith so far unseen artifacts as well as to generalize to novel scenarios. Only in the second case, the initial path predictionoverlaps with the barriers in front of the vehicle. However,this is not a safety issue if the predictions are combined witha motion planner as highlighted in the next section.IV. G
UIDED M OTION P LANNING
This section evaluates the performance gain of thesampling-based motion planner BiRRT* guided by the posepredictions of the CNN. As before, the results are comparedwith purely uniform sampling and the OSE approach. Thebenchmark is conducted using Gazebo and ROS in threechallenging automated driving scenarios that were excludedfrom the dataset in the previous section. As it can be seenin Fig. 9, these scenarios differ from the ones used to trainthe CNN (see Fig. 2) with respect to the spatial arrangementof the objects. Additionally, they contain novel artifacts liketraffic cones and previously unseen vehicle geometries.The parameters of the ego-vehicle are based on a full-size car and can be found in Tab. IV. Both the maximumcurvature and the maximum curvature rate already include
10 % control reserve for closed-loop execution.
TABLE IV: Parameters of the ego-vehicle.parameter valuelength .
926 m width .
086 m wheel base .
912 m maximum curvature . − maximum curvature rate . − A
60 m ×
60 m occupancy grid with a resolution of
10 cm is used to represent the static environment. Collision checks
ABLE V: Motion planning results after runs of the same experiment in the three described scenarios. The table displays the comp. time for posepredictions, the time-to-first-solution TTFS, the number of vertices in the tree, the number of cusps, the path length, the cost of the first solution cost
TTFS ,the cost after of optimization cost
TTFS+3s , and the success rate. Values given with mean and standard deviation are highlighted with a plus-minus sign.scenario heuristic steer. fct. pred. [ ms ] TTFS [s] [ − ] [ − ] length [m] cost TTFS [ − ] cost TTFS+3s [ − ] success [%] I - HC -RS - 0.57 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± -Dub. - 3.51 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± -RS - 2.92 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± are executed every
10 cm using a convex polygon approxima-tion of the vehicle’s contour. The vertices of that polygonare inflated by
10 cm serving as a hard safety buffer. Anadditional soft safety margin is introduced by converting theoccupancy grid into a cost map with a
25 cm inflation radiusas illustrated in Fig. 9.In order to compute an initial motion plan, BiRRT*is executed for maximal
10 s . Another are assignedfor optimization if an initial solution is found. Otherwise,failure is reported for the corresponding run. Note that allcomputations are executed together with the simulation onan Intel Xeon E5@ . and an Nvidia Quadro K2200,which raises the CNN’s computation time by a factor oftwo compared to the previous benchmark (see Tab. III andTab. V).Uniform sampling is conducted in the
60 m ×
60 m regionof the occupancy grid with a goal sampling frequency of .For guided motion planning, heuristic samples are generatedin batches of at a frequency of . Each batch ofsamples is then evenly mixed with uniform samples suchthat the theoretical guarantees of BiRRT* are not violated.The sampled vehicle poses are connected using the con-tinuous curvature steering functions CC -Dubins [29] inScenario II and HC -Reeds-Shepp [30] in Scenario I/III,where the superscript denotes the initial and final curvature ofthe steering procedure. The major difference between the twosteering functions is that the first one constrains the vehicleto driving forwards only, while the second one allows movingforwards and backwards.A summary of the experimental results is shown in Tab. V,where each setup is repeated times to compensate forrandomization effects of the planner. It can be observed thatBiRRT* in combination with the CNN-based predictionsclearly outperforms the two other approaches. The CNNguides the motion planner to the best initial solution andhelps to converge to the lowest cost. This is also highlightedin Fig. 10, which illustrates the convergence rate of thedifferent approaches in the tested scenarios. In addition tothat, guiding BiRRT* with the CNN is the only approachwith a success rate of
100 % in all three scenarios. Comparedto uniform sampling, the samples from the CNN not onlystabilize the time-to-first solution (TTFS), but also reduceits mean by up to an order of magnitude. This value is currently limited by the network’s inference time and caneven be further reduced by a more advanced implementationof the prediction pipeline.Guiding the motion planner with the OSE heuristic yields alow computation time in Scenario II and III, where the circleexpansion gives a good approximation of the planned path.As opposed to that, the performance deteriorates significantlyin Scenario I, where no solution can be found in out of runs. This is due to the fact that the OSE approachassumes a circular holonomic robot and only considers thenonholonomic constraints using cost terms [3]. As a result,the OSE proposes an immediate turnaround maneuver inScenario I, where an evasive multi-point turn would berequired to resolve this situation (see Fig. 9).V. C ONCLUSION
The complexity of motion planning combined with thereal-time constraints of automated vehicles require heuris-tics that guarantee a fast convergence to a cost-minimizingsolution. Within this context, a convolutional neural net-work (CNN) is proposed that predicts ego-vehicle posesfrom a start to a goal state while taking into account thecurrent surrounding. A benchmark on a recorded datasethighlights the CNN’s capability to predict path corridors witha higher accuracy compared to two baselines: uniform sam-pling and a state-of-the-art A*-based approach. Furthermore,it is demonstrated that the CNN has the capability to adapt itsprediction to novel scenarios with previously unseen artifacts.Together with the sampling-based motion planner BiRRT*,this results in a significant reduction of computation timeto about
100 ms , high-quality paths, and success rates of
100 % in three challenging automated driving scenarios. Inconclusion, the proposed method is especially suitable forreal-time motion planning in complex environments.VI. A
CKNOWLEDGEMENT
The authors thank Florian Faion for valuable discussionsand insights. R
EFERENCES[1] H. Banzhaf et al. , “Learning to Predict Ego-Vehicle Poses forSampling-Based Nonholonomic Motion Planning,”
IEEE Robotics andAutomation Letters , 2019, DOI 10.1109/LRA.2019.2893975. . . . . . . time [s] a v g . c o s t [ - ] no heuristicOSECNN 020406080100 s u cc e ss [ % ] (a) Scenario I . . . . . . time [s] a v g . c o s t [ - ] no heuristicOSECNN 020406080100 s u cc e ss [ % ] (b) Scenario II . . . . . . time [s] a v g . c o s t [ - ] no heuristicOSECNN 020406080100 s u cc e ss [ % ] (c) Scenario IIIFig. 10: Comparison of the convergence rate in the three motion planning problems. The line colors distinguish the different heuristics, the solid linesillustrate the cost of the motion plan averaged over 100 runs, and the dotted lines display the percentage of succeeded runs. Note that while the cost of asingle run is always monotonically decreasing, averaging over multiple runs might lead to a non-monotonic behavior.[2] D. Dolgov et al. , “Path Planning for Autonomous Vehicles in Un-known Semi-structured Environments,” The International Journal ofRobotics Research , 2010.[3] C. Chen et al. , “Path Planning with Orientation-Aware Space Explo-ration Guided Heuristic Search for Autonomous Parking and Maneu-vering,” in
IEEE Intelligent Vehicles Symposium , 2015.[4] D. Gonz´alez et al. , “A Review of Motion Planning Techniques forAutomated Vehicles,”
IEEE Transactions on Intelligent TransportationSystems , 2016.[5] M. Jordan and A. Perez, “Optimal Bidirectional Rapidly-ExploringRandom Trees,” MIT, Tech. Rep., 2013, TR 2013-021.[6] M. Likhachev and D. Ferguson, “Planning Long Dynamically-FeasibleManeuvers for Autonomous Vehicles,”
The International Journal ofRobotics Research , 2009.[7] D. A. Pomerleau, “ALVINN: An Autonomous Land Vehicle In aNeural Network,” in
Advances in Neural Information ProcessingSystems , 1989.[8] M. Bojarski et al. , “End to End Learning for Self-Driving Cars,” arXivpreprint arXiv:1604.07316 , 2016.[9] A. Tamar et al. , “Learning from the Hindsight Plan - Episodic MPCImprovement,” in
IEEE International Conference on Robotics andAutomation , 2017.[10] B. Kim, L. P. Kaelbling, and T. Lozano-P´erez, “Guiding Searchin Continuous State-action Spaces by Learning an Action Samplerfrom Off-target Search Experience,” in
AAAI Conference on ArtificialIntelligence , 2018.[11] I. Goodfellow et al. , “Generative Adversarial Nets,” in
Advances inNeural Information Processing Systems , 2014.[12] S. Choudhury et al. , “Data-driven planning via imitation learning,”
The International Journal of Robotics Research , 2018.[13] C. Hubschneider et al. , “Integrating End-to-End Learned Steering intoProbabilistic Autonomous Driving,” in
IEEE International Conferenceon Intelligent Transportation Systems , 2017.[14] K. Sohn, H. Lee, and X. Yan, “Learning Structured Output Repre-sentation using Deep Conditional Generative Models,” in
Advances inNeural Information Processing Systems , 2015.[15] B. Ichter, J. Harrison, and M. Pavone, “Learning Sampling Distribu-tions for Robot Motion Planning,” in
IEEE International Conferenceon Robotics and Automation , 2018.[16] N. P´erez-Higueras et al. , “Learning Human-Aware Path Planning withFully Convolutional Networks,” in
IEEE International Conference onRobotics and Automation , 2018.[17] S. Karaman and E. Frazzoli, “Incremental Sampling-based Algorithmsfor Optimal Motion Planning,”
Robotics Science and Systems VI , 2010.[18] A. H. Qureshi et al. , “Motion Planning Networks,” arXiv preprintarXiv:1806.05767 , 2018.[19] B. Ichter and M. Pavone, “Robot Motion Planning in Learned LatentSpaces,” arXiv preprint arXiv:1807.10366 , 2018.[20] A. Srinivas et al. , “Universal Planning Networks,” in
InternationalConference on Machine Learning , 2018.[21] U. Baumann et al. , “Predicting Ego-Vehicle Paths from EnvironmentalObservations with a Deep Neural Network,” in
IEEE InternationalConference on Robotics and Automation , 2018. [22] L. Caltagirone et al. , “LIDAR-based Driving Path Generation UsingFully Convolutional Neural Networks,” in
IEEE International Confer-ence on Intelligent Transportation Systems , 2017.[23] V. Badrinarayanan et al. , “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” arXiv preprintarXiv:1511.00561 , 2015.[24] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating DeepNetwork Training by Reducing Internal Covariate Shift,” in
Interna-tional Conference on Machine Learning , 2015.[25] V. Nair and G. E. Hinton, “Rectified Linear Units Improve RestrictedBoltzmann Machines,” in
International Conference on Machine Learn-ing , 2010.[26] S. Thrun et al. , Probabilistic Robotics . MIT Press, 2005.[27] D. P. Kingma and J. L. Ba, “Adam: A Method for Stochastic Opti-mization,” in
International Conference on Learning Representations ,2015.[28] H. Banzhaf et al. , “High Density Valet Parking Using k -Deques inDriveways,” in IEEE Intelligent Vehicles Symposium , 2017.[29] T. Fraichard and A. Scheuer, “From Reeds and Shepp’s to Continuous-Curvature Paths,”
IEEE Transactions on Robotics , 2004.[30] H. Banzhaf et al. , “Hybrid Curvature Steer: A Novel Extend Functionfor Sampling-Based Nonholonomic Motion Planning in Tight Environ-ments,” in