[PDF] Group Marching Tree: Sampling-Based Approximately Optimal Motion Planning on GPUs

Abstract

This paper presents a novel approach, named the Group Marching Tree (GMT*) algorithm, to planning on GPUs at rates amenable to application within control loops, allowing planning in real-world settings via repeated computation of near-optimal plans. GMT*, like the Fast Marching Tree (FMT) algorithm, explores the state space with a "lazy" dynamic programming recursion on a set of samples to grow a tree of near-optimal paths. GMT*, however, alters the approach of FMT with approximate dynamic programming by expanding, in parallel, the group of all active samples with cost below an increasing threshold, rather than only the minimum cost sample. This group approximation enables low-level parallelism over the sample set and removes the need for sequential data structures, while the "lazy" collision checking limits thread divergence---all contributing to a very efficient GPU implementation. While this approach incurs some suboptimality, we prove that GMT* remains asymptotically optimal up to a constant multiplicative factor. We show solutions for complex planning problems under differential constraints can be found in ~10 ms on a desktop GPU and ~30 ms on an embedded GPU, representing a significant speed up over the state of the art, with only small losses in performance. Finally, we present a scenario demonstrating the efficacy of planning within the control loop (~100 Hz) towards operating in dynamic, uncertain settings.

Full PDF

GGroup Marching Tree: Sampling-BasedApproximately Optimal Motion Planning on GPUs

Brian Ichter

Aeronautics & AstronauticsStanford UniversityStanford, California 94305Email: [email protected]

Edward Schmerling

Institute for Computational &Mathematical EngineeringStanford UniversityStanford, California 94305Email: [email protected]

Marco Pavone

Aeronautics & AstronauticsStanford UniversityStanford, California 94305Email: [email protected]

Abstract —This paper presents a novel approach, named theGroup Marching Tree (GMT ∗ ) algorithm, to planning on GPUsat rates amenable to application within control loops, allowingplanning in real-world settings via repeated computation of near-optimal plans. GMT ∗ , like the Fast Marching Tree (FMT ∗ )algorithm, explores the state space with a “lazy” dynamicprogramming recursion on a set of samples to grow a treeof near-optimal paths. GMT ∗ , however, alters the approach ofFMT ∗ with approximate dynamic programming by expanding,in parallel, the group of all active samples with cost below anincreasing threshold, rather than only the minimum cost sample.This group approximation enables low-level parallelism over thesample set and removes the need for sequential data structures,while the “lazy” collision checking limits thread divergence—all contributing to a very efﬁcient GPU implementation. Whilethis approach incurs some suboptimality, we prove that GMT ∗ remains asymptotically optimal up to a constant multiplicativefactor. We show solutions for complex planning problems underdifferential constraints can be found in ~10 ms on a desktop GPUand ~30 ms on an embedded GPU, representing a signiﬁcantspeed up over the state of the art, with only small losses inperformance. Finally, we present a scenario demonstrating theefﬁcacy of planning within the control loop (~100 Hz) towardsoperating in dynamic, uncertain settings. I. I

NTRODUCTION

Robotic systems are increasingly operating in real-worldsettings—away from the structure, repetition, and certainty ofthe factory ﬂoor—that require a robot to not only sense itsenvironment and state in real time, but to react accordingly[1]. Acting in these paradigms often necessitates motion plansbe computed on the basis of limited state and environmentalknowledge, both of which may vary rapidly as informationis gathered and the robot’s surroundings change. A majorchallenge in this approach is thus replanning quickly, ideallyup to the bound of the control feedback loop frequency(~100 Hz), particularly for systems governed by dynamicconstraints operating in complex environments.Sampling-based motion planning has emerged as an es-pecially successful paradigm for rapid planning in complex,high-dimensional, and unstructured environments [2], and ithas been shown to extend well to planning with differentialconstraints [3][4]. These methods probe the state space witha set of samples to be connected, under the supervision of

This work was supported by a Qualcomm Innovation Fellowship andby NASA under the Space Technology Research Grants Program, GrantNNX12AQ43G. Brian Ichter was supported by the DoD NDSEG Program. a collision detection module, to form a traversable graphrepresentation of the free state space. Sampling-based roadmapmethods, such as the probabilistic roadmap algorithm (PRM)[5] and its asymptotically optimal variant PRM ∗ [6], initiallyconstruct a graph where samples are connected to each oftheir near neighbors, provided the connection is collision-free. A shortest path search is performed on the resultingroadmap to yield solutions (up to the resolution constraintsof the underlying graph) to the optimal planning problem[2]. As these methods are limited in their speed by theinitial graph building stage, variants have been developed thatsimultaneously construct the graph edges while searching,e.g., Lazy PRM [7], which avoid performing any collisionchecks that are not required during the roadmap shortestpath computation. The Fast Marching Tree algorithm (FMT ∗ )further reduces collision checking by implementing directdynamic programming (as opposed to full shortest path search)while constructing a tree subgraph of a disk graph deﬁnedby connection cost, increasing performance particularly incomplex, high-dimensional spaces [8]. Yet even with theseadvances, path plan computation times are often over an orderof magnitude greater than the periods of controller loops andwith the slowing growth rate of CPU computational power(due primarily to limited clock frequency), it is unlikelyraw CPU power will soon bridge this gap. We instead pro-pose algorithm development for a different paradigm: parallelcomputing, with a particular focus on development for theinterplay between algorithmic design and the many thousandcore architectures of GPUs. Unfortunately, while we seeka solution inspired by the dynamic programming literaturefor a single pair of start/goal states (the use case relevantto control loop planning), the inherently sequential natureof dynamic programming’s minimum cost node expansion,which, e.g., FMT ∗ is built on, complicates the necessarymassive parallelization. Statement of Contributions.

In this work, we propose theuse of approximate dynamic programming (ADP) methodsthat leverage algorithm parallelism for greater speed whileincurring only a bounded degree of suboptimality. We presentthe Group Marching Tree (GMT ∗ ) algorithm that, like theFast Marching Tree algorithm (FMT ∗ ) [8], performs a “lazy”dynamic programming recursion on a set of samples in thestate space to grow a tree of near-optimal cost-to-arrive paths. a r X i v : . [ c s . R O ] M a y MT ∗ , however, varies from the approach of FMT ∗ withADP by expanding the tree, in parallel, from the group ofall active samples with cost below a threshold, rather thanonly the minimum cost sample (essentially locally relaxingthe principle of optimality). This group approximation enableslow-level parallelism over the sample set and removes the needfor sequential data structures, allowing for massive paralleliza-tion on GPUs. The “lazy” collision checking further facilitatesGPU implementation by limiting thread divergence at thelowest levels. While these approximations do introduce somesuboptimality, we prove that GMT ∗ remains asymptoticallyoptimal up to a constant multiplicative factor and demonstratethrough numerical experiments that the empirical loss is wellbelow the theoretical bound.We further discuss the implementation of GMT ∗ on GPUarchitectures and show its application to several illustrativemotion planning problems with differential constraints, forwhich we consider kinodynamic and nonholonomic planning.These numerical experiments show that solution trajectoriescan be computed in ~10 ms on a consumer grade GPU and~30 ms on an embeddable GPU; achieving computation timestwo orders of magnitude faster than a state-of-the-art CPUalgorithm and an order of magnitude faster than a state-of-the-art GPU algorithm, again with only small performancelosses. Lastly, we demonstrate the efﬁcacy of planning withinthe control loop on a simpliﬁed quadrotor in a collapsingcave environment, with state disturbances and environmentaldynamism. Related Work.

Previous works have addressed planning inreal-world settings through a number of methods. One ap-proach is that of feedback motion planning, which traditionallydeﬁnes a policy over the state space to allow the currentstate to be fed back into the controller [1]. Unfortunately,the ephemeral nature of the planning environment complicatesthis process; while ideally these feedback plans would alwaysreﬂect the current knowledge state, they may become quicklyoutdated and inaccurate if updating or recomputing plans istoo computationally intensive. Some methods exist to simplifythis computation, such as [9], which computes a ﬁeld ofguiding vectors over the entire free state space, howeverthey generally must interpolate over the state space to deﬁnethe local action and assume the state space can be easilyrepresented. The high-frequency replanning approach of [10]uses a similar approach to our own by quickly replanning fulltrajectories. This work leverages parallelism to generate manyrapidly-exploring random tree (RRT) trajectories, selectingthe trajectory with the lowest collision probability at eachcomputation step. Our work focuses on construction of asingle tree to allow planning within the loop at rates of~100 Hz, rather than the 4 Hz considered in [10]. Other works,e.g., RRT X [11], have accelerated the replanning approach byiteratively rewiring a single tree as new information becomesavailable. By reusing the previous tree at each time step,these methods are limited in scenarios where the environmentchanges drastically. Furthermore, the RRT X tree is rooted atthe goal state to enable use in scenarios with disturbances, butthis limits its utility in problems with a changing goal state.Another approach to planning in changing environments is to couple low-frequency global planners with high-frequencyreactive controllers that determine actions which are collision-free and optimal in a local sense. Using methods such as pre-computed trajectory libraries and funnels [12], learning [13],and potential ﬁelds [14], this approach has shown practicalsuccess in many settings, however, its focus on the local regioncan ignore variations in global reachability resulting fromactions (e.g., not accounting for momentum or nonholonomicconstraints) and their use of heuristics to inform actions mayincur suboptimality (e.g., in maze-like environments). In thiswork GMT ∗ is shown to be capable of computing motion plansin a tempo comparable to reactive controllers, lessening theneed for consideration of only local actions. GMT ∗ can furtherbe used in concert with reactive controllers to better informactions via accurate cost-to-go computations, thus potentiallybeneﬁting from properties like the robustness in [12].A main tenet of our work is the use of approximate dynamicprogramming (ADP) to allow parallelism while exploring thestate space. Similar search methodologies, i.e., expandingwavefronts in low-cost groups, have been used successfullyfor graph search to allow parallelism and complexity reduc-tion. Dial’s algorithm [15] implements Dijkstra’s algorithmon graphs with integer weights by stepping through buckets,exactly solving for the shortest paths. The ∆ -stepping algo-rithm [16] generalizes Dial’s algorithm to solve exactly thesingle source shortest path problem on non-negative real-valueweighted graphs by successively relaxing edges while steppingthrough buckets with width ∆ , however it performs extrawork by revisiting edges to maintain exactness. The GroupMarching Method [17] builds on the Fast Marching Method(an inspiration for FMT ∗ ) to solve the eikonal equations byadvancing a group of points together in two iterations, the ﬁrstforward and the second backwards to correct for instabilities.Our work employs a similar expansion strategy, but abandonsany additional computation necessary to maintain exactness,instead leveraging the underlying disk graph to maintainasymptotic optimality within a constant factor.Parallelization too has been applied successfully to mo-tion planning by a number of researchers, ﬁnding signiﬁcantalgorithm accelerations. An early result in sampling-basedmotion planning showed that probabilistic roadmap methodsare embarrassingly parallel [18], which was later extendedto implementation on GPUs [19]. The focus of PRM-basedapproaches on entire graph construction however can beprohibitively slow even with GPUs. Common approaches toparallelization of sampling-based planning include focusingonly on algorithm subroutines (such as collision checking andnearest neighbor search) [19][20], adapting serial algorithmsvia AND/OR-parallelism [10][21], or using load balancing anddomain decomposition [22][23]. GMT ∗ ’s ADP is parallel atthe sample level, meaning many of these methodologies (e.g.,collision checking, domain decomposition) are applicable inimplementation. Furthermore, this low-level parallelism en-ables massive parallelization for use on GPUs. Organization.

The remainder of this document is organizedas follows. Section II describes the problem setup. Section IIIdiscusses the GMT ∗ algorithm and proves its asymptoticoptimality up to a constant multiplicative factor. Section IViscusses its implementation on GPUs. Section V demonstratesthe performance of GMT ∗ with motion planning problemsunder differential constraints. Lastly, Section VI summarizesour ﬁndings and proposes directions for future work.II. P ROBLEM S ETUP

In this section and the next (which contains the descriptionand analysis of the GMT ∗ algorithm), for ease of expositionwe consider the geometric planning problem—the problem,loosely speaking, of computing the shortest free path froman initial state to a goal region where any two states can beconnected by a straight line. The problem is brieﬂy overviewedhere, but a full, detailed problem formulation can be found in[8]. Let X = [0 , d be the state space, where d ∈ N , d ≥ .Let X obs be the obstacle space, X free = X \ X obs be the freestate space, x init ∈ X free be the initial condition, and X goal ⊂X free be the goal region. A path is said to be collision-free if σ ( τ ) ∈ X free for all τ ∈ [0 , . A path is said to be feasible ifit is collision-free, σ (0) = x init , and σ (1) ∈ X goal . Problem 1 (Optimal path planning) . Given a path planningproblem ( X free , x init , X goal ) and a cost measure c , ﬁnd afeasible path σ ∗ such that c ( σ ∗ ) = min { c ( σ ) : σ is feasible } .If no such path exists, report failure. For Section III we consider the cost measure c ( σ ) as thearc length of σ with respect to the Euclidean metric andwrite (cid:107) y − x (cid:107) to denote the cost of the shortest path between x, y ∈ X . Though this is arguably the simplest formulationof robotic motion planning, we note the analytical distinctionbetween (a) determining, for more general problem setups withdifferential constraints or alternative costs, whether a sampleset in X admits a near-optimal trajectory as a sequence of localconnections, and (b) arguing that a planning algorithm is capa-ble of identifying such a high-quality solution given a sampleset. We refer the reader to our previous works [3][4][24] fordiscussion on the ﬁrst point, including expressions for localconnection radii under both randomized and deterministicstate space sampling, and abbreviate the relevant discussion(Theorems 1 and 3) in the present work. The novel theoreticalcontribution of this paper is establishing that GMT ∗ , whichachieves a high degree of parallelism in a single queryapproach unlike the FMT ∗ and PRM ∗ algorithms analyzedin those works, still recovers asymptotically optimal paths (upto a constant factor) under the same sampling and connectionradius assumptions (Theorem 2). Our numerical experimentsin Section V consider both kinodynamic and nonholonomicplanning problems, speciﬁcally double integrator and Dubinsairplane dynamics.III. T HE G ROUP M ARCHING T REE A LGORITHM

A. GMT ∗ We now detail the Group Marching Tree algorithm (GMT ∗ )to be used to approximately solve the optimal path planning To accommodate alternative dynamics/costs we may instead consider d ( x, y ) , the result of solving an optimal two-point boundary value problemconnecting x to y , and replace any discussion of connection balls in Section IIIwith the notion of bounded-cost reachable sets. problem. GMT ∗ performs a “lazy”, approximate dynamicprogramming recursion to grow a tree of paths in cost-to-arrivespace. This amounts to iteratively attempting to expand allsamples in the tree branches below a constantly increasing costthreshold, rather than expanding only the minimum cost sam-ple. The resulting algorithm enables simultaneous graph build-ing and exploration of the state space, in a manner amenableto complex planning problems, such as high-dimensional,cluttered environments and differential constraints.A description of the algorithm is given in Alg. 1, witha single iteration visualized in Fig. 1. The algorithm takesas input the planning problem ( X free , x init , X goal ) , a sampleset V unexplored of n samples in X free (at least one in X goal ),a connection radius r , and a group cost threshold factor λ ∈ (0 , . Together, λ and r deﬁne the group cost threshold in-crement δ , equal to λr . We refer to nodes (or interchangeably,samples) as neighbors if the connection cost between themis less than the connection radius r , as deﬁned in Theorem1; r = 4(1 + η ) /d (cid:0) /d (cid:1) /d (cid:0) µ ( X free ) /ζ d (cid:1) /d (cid:0) log n/n (cid:1) /d where η ≥ is a tuning parameter, µ ( X free ) denotes the d -dimensional Lebesgue measure of X free , and ζ d denotes thevolume of the unit ball in d -dimensional Euclidean space.This r applies for geometric planning with Euclidean cost and V unexplored sampled uniformly randomly from X free ; smaller r may be considered if V unexplored is sampled with low-dispersion, deterministic sequences [24]. Algorithm 1

Group Marching Tree Algorithm

Require: connection radius r , group cost threshold factor λ ,set V unexplored of n samples in X free , at least one in X goal Place x init in V open , set i = 0 and δ = λr Initialize tree with root node x init Find nodes G in V open with cost ≤ iδ For each unexplored neighbor, x , of any node in G : Find neighbor nodes y in V open Find locally-optimal connection to x from a node in y If that connection is collision-free: Add edge to tree Remove x from V unexplored and add to V open Remove G from V open and add to V closed Increment i Skip to Line 3 until either: a: V open is empty ⇒ return failure b: A node in G is in X goal ⇒ return min cost path to X goal The algorithm proceeds by expanding a tree of paths out-ward through the state space, maintaining the samples in threesets: V unexplored , V open , and V closed . V unexplored consists ofsamples not yet added to the tree. V open consists of samplesadded to the tree and still considered for expansion; intuitivelythese are the samples on the tree’s outer “branches”, i.e., closeto the wavefront. V closed consists of samples added to the treeand no longer considered for expansion; intuitively these arethe samples too far from the edge of the expanding tree tomake any new connections. The algorithm begins by addingonly x init to V open , setting V closed empty, and initializing thetree of paths with x init at its root (Lines 1-2). At each iteration i , all nodes in V open with cost below iλr ( iδ ) are placed into aset G , denoting the group of samples to be expanded in parallel(Line 3). The amount of parallelism of this step is controlledby a tuning parameter λ , referred to as the group cost thresholdactor, which represents a trade-off between parallelism andpotential unconsidered optimal connections; λ → representsexpanding all nodes in the open set at once, resulting in nearlya breadth ﬁrst search, while λ → represents expanding onlythe minimum cost nodes in a given iteration, resulting in thesame ﬁnal solution as FMT ∗ . The other side of the tradeoff,the unconsidered optimal connections, arises when multiplenodes along the optimal path are considered in the same groupexpansion, meaning they cannot connect to each other. Theymay also occur when paths formed from the concatenationof several short connections fall behind the wavefront. Theseeffects, however, are curtailed by the λ factor, the underlyingdisk graph’s structure, and the notion that longer steps willgenerally be more favorable than many short steps. Even withthis suboptimality, we show in Theorem 3 that asymptoticoptimality within a constant multiplicative factor is main-tained; furthermore, the performance loss observed in practiceis studied in Section III-C and shown to be small. With thegroup of samples G in hand, all neighbors of G in V unexplored are considered for addition to the tree (Line 4). Denoting eachunexplored neighbor of samples in G by x and the neighborsof x in V open by y , GMT ∗ then selects the locally-optimalconnection, where locally-optimal is deﬁned as the connectionwith the lowest cost for the previously computed path to y concatenated with the straight line path from y to x (Line6). Note that while this step potentially introduces suboptimalconnections by lazily ignoring the presence of obstacles, asin FMT ∗ , these connections become vanishingly rare as thenumber of samples goes to inﬁnity, as discussed and provenin [8]. If the selected connection is collision-free, it is addedto the tree and x is removed from V unexplored and added to V open (Lines 7-9). When all samples have been considered, i is incremented and the samples in G are removed from V open and added to V closed (Lines 10-11). The algorithm then movesto the next iteration, beginning at Line 3, or terminates if either V open is empty or a node in G is in X goal (Line 12). B. GMT ∗ Approximate Asymptotic Optimality

Our analysis of GMT ∗ begins with the concept of prob-abilistic exhaustivity as applied in related work establishingasymptotic optimality for a range of geometric [25] and differ-entially constrained [3][4] batch-processing, sampling-basedmotion planning algorithms. Brieﬂy, probabilistic exhaustivityis the notion that within a sufﬁciently large set of uniformlysampled states, a sequence of samples approximating any patharbitrarily well may be found. This property may be used toconstruct sample sequences approaching the optimal solutionthat are amenable for recovery by a planning algorithm.In the subsequent analysis, we deﬁne SampleFree ( n ) tobe a function that returns n points sampled independently andidentically from the uniform distribution on X free , at least oneof which is in X goal . We deﬁne a path σ : [0 , → X and apath y : [0 , → X that sequentially connects the sequenceof waypoints { y m } Mm =0 ∈ X with line segments. We say thesequence of waypoints { y m } ( (cid:15), r ) − traces the path σ if thefollowing conditions hold: (i) || y m − y m +1 || ≤ r for all m , (ii)the cost of y is bounded as c ( y ) ≤ (1 + (cid:15) ) c ( σ ) , and (iii) thedistance from any point y to σ is no more than r . We formallystate this in Theorem 1 (proven as Theorem IV.5 in [3]). (a) (b)(c) (d) Fig. 1: One iteration step of GMT ∗ expansion, labeled with(Fig., Line in Alg. 1). (1a, Line 3) shows the selection ofthe group G . (1b, Line 4) shows the selection of the nearestneighbors of G in V unexplored . (1c, Line 5) shows the candidateconnections from which the locally optimal is chosen toconnect the new samples. (1d, Line 3) shows the new treeafter an iteration, Lines 3-11, and the new group G . Note,each group relies only on pathwise cost; the iδ ball is onlyshown for illustrative purposes. Theorem 1 (Probabilistic Exhaustivity) . Deﬁne a planningproblem ( X free , x init , X goal ) , a feasible path σ : [0 , → X free ,a set of samples V = { x init }∪ SampleFree ( n ) , and (cid:15) > . Fora ﬁxed n consider the event A n that there exists { y m } Mm =0 ∈ V , y = x init , y M ∈ X goal , which ( (cid:15), r ) − trace σ , where r = 4(1 + η ) d (cid:18) d (cid:19) d (cid:18) µ ( X free ) ζ d (cid:19) d (cid:18) log nn (cid:19) d with η ≥ . Then, as n → ∞ , the probability that A n doesnot occur (denoted by its complement A cn ) is asymptoticallybounded as P [ A cn ] = O ( n − nd log − d n ) . We now show that the cost of the path returned by GMT ∗ is bound within a constant multiplicative factor of the cost ofany such sequence of tracing waypoints. Theorem 2 (Bounded Suboptimality) . Let r > and supposethat the sequence of waypoints { y m } Mm =0 ⊂ X free satisﬁes y = x init , y M ∈ X goal , (cid:107) y m − y m − (cid:107) ≤ r for all m ∈{ , . . . , M } and B ( y m , r ) ⊂ X free for all m ∈ { , . . . , M } .Let c GMT ∗ denote the cost of the path returned by GMT ∗ usinga connection radius r and group cost threshold factor λ . Then c GMT ∗ ≤ (1 + 2 λ ) M (cid:88) k =1 (cid:107) y k − y k − (cid:107) . Proof.

In the subsequent analysis we will refer to the openset of nodes at iteration i as V open ( i ) . We ﬁrst assume,without loss of generality, that (cid:107) y m − y m − (cid:107) > r for all m ∈ { , . . . , M } . This condition may be enforced by makinga forward pass over the sequence and omitting the precedingpoint y m − for any y m that violates the condition, withoutaffecting the other lemma assumptions and only decreasing Mk =1 (cid:107) y k − y k − (cid:107) . Consider running GMT ∗ to completionand for each y m let c ( y m ) denote the cost-to-arrive of y m inthe generated tree. If y m is not contained in any edge of thetree, we set c ( y m ) = ∞ . Let i m denote the ﬁrst iteration afterwhich y m has been added to the GMT ∗ tree, that is, i m = min { i ∈ N | y m ∈ V open ( i ) } . Deﬁne S = 0 , S m = (cid:80) mk =1 (cid:107) y k − y k − (cid:107) , the cost of thepath connecting y , y , . . . , y m , and denote δ = λr . We showby induction that for all m ∈ { , . . . , M } , one of the followingtwo possibilities must hold: c GMT ∗ ≤ S m + mδ, (1)or c ( y m ) ≤ S m + ( m − δ and i m ≤ (cid:24) S m − δ (cid:25) + m. (2)The base case m = 1 is trivial, since G contains only x init = y , and thus the ﬁrst GMT ∗ iteration makes everycollision-free connection between x init and the nodes con-tained in B ( x init , r ) , including y . Then c ( y ) = (cid:107) y − y (cid:107) = S and i = 1 = (cid:100) S /δ (cid:101) +1 . Now suppose (1) or (2) holds for m − ; that means that one of the following four statementsmust hold.1. c GMT ∗ ≤ S m − + ( m − δ ,2. c ( y m − ) ≤ S m − + ( m − δ and i m − ≤ (cid:108) S m − δ (cid:109) + m − anda. GMT ∗ ends before considering y m for connection( i m = ∞ ), orb. y m − ∈ V open when y m is ﬁrst considered forconnection ( i m − ≤ i m − ), orc. y m − / ∈ V open when y m is ﬁrst considered forconnection ( i m − ≥ i m ).We show that in each of these cases, (1) or (2) holds for m .Case 1: Since (cid:107) y m − y m − (cid:107) ≥ , we have c GMT ∗ ≤ S m − + ( m − δ ≤ S m + mδ. Case 2a: The fact that y m goes unconsidered means that upuntil the time the algorithm terminates at iteration i term (uponﬁnding a goal point in group G i term ), y m − has never been amember of an expansion group: i term ≤ max { i m − , (cid:100) c ( y m − ) /δ (cid:101)}≤ max {(cid:100) S m − /δ (cid:101) + m − , (cid:100) c ( y m − ) /δ (cid:101)} . Then since the algorithm terminates at iteration i term , c GMT ∗ ≤ i term δ ≤ max { S m − + δ + ( m − δ, c ( y m − ) + δ }≤ S m − + mδ. Case 2b: y m must be connected to some parent when it is ﬁrstconsidered and y m − is a candidate, so c ( y m ) ≤ c ( y m − ) + (cid:107) y m − − y m (cid:107) ≤ S m + ( m − δ. Depending on whether y m − spends one or more steps in V open , either i m = i m − + 1 or i m = (cid:100) c ( y m − ) /δ (cid:101) + 1 . Inthe ﬁrst subcase, i m ≤ (cid:108) S m − δ (cid:109) + m ; in the second subcase, i m ≤ (cid:100) S m − /δ (cid:101) + m − . Case 2c: When y m is ﬁrst considered for connection duringiteration i (cid:48) < i m ≤ i m − , there must be some z ∈ G i (cid:48) suchthat (cid:107) y m − z (cid:107) ≤ r . Then c ( y m ) ≤ c ( z ) + r ≤ i (cid:48) δ + r ≤ ( i m − − δ + r ≤ S m − + ( m − δ + r. Recalling that the y m , by construction, are spaced so thatthey satisfy the property r < (cid:107) y m − y m − (cid:107) ≤ (cid:107) y m − y m − (cid:107) + (cid:107) y m − − y m − (cid:107) , we have c ( y m ) ≤ S m + ( m − δ. Additionally, i m ≤ i m − ≤ (cid:108) S m − δ (cid:109) + m − .Thus the inductive step holds in all cases and (1) or (2)holds for all m . In particular taking m = M we have c GMT ∗ ≤ S M + M δ or c GMT ∗ ≤ c ( y M ) ≤ S M + ( M − δ . Noting that S M ≥ M r/

M δ/ (2 λ ) , we have c GMT ∗ ≤ (1 + 2 λ ) M (cid:88) k =1 (cid:107) y k − y k − (cid:107) as desired.Given these results, we are now in a position to proveasymptotic optimality within a suboptimality bound. Theorem 3 (GMT ∗ Approximate Asymptotic Optimality) . Assume a δ obs -robustly feasible path planning problem, asdeﬁned in [3], with optimal path σ ∗ and cost c ∗ . Let c n denotethe cost returned by GMT ∗ with n samples. Then for any (cid:15) > , lim n →∞ P [ c n > (1 + (cid:15) )(1 + 2 λ ) c ∗ ] = 0 . Proof.

The proof of this theorem is conceptually identicalto Theorem VI.2 in [3]; with high probability as n → ∞ there exists a sequence of waypoints (Theorem 1) tracing anobstacle-clear, near-optimal path of cost ≤ (1 + (cid:15) ) c ∗ whichGMT ∗ recovers up to a factor (1 + 2 λ ) (Theorem 2).We remark that extending the results of this section todifferentially-constrained system dynamics or deterministicsampling methods does not substantially change the argu-ment in Theorem 3 (although in particular, with deterministicsampling the convergence in probability may be replacedwith a standard limit). All that is required is an analogue ofTheorem 1, a statement on the regularity of the dynamics thatimplies, with sufﬁcient sample density, that the near-neighbordisk graph contains near-optimal paths. The only GMT ∗ -speciﬁc analysis in Theorem 2 is a graph search bound—independent of dynamics or sampling methodology. C. Numerical Experiments: Suboptimality Introduced

To complement the above theoretical bounds, in this sub-section we examine the suboptimality incurred in practicethrough numerical experiments. While we will later describeimplementation details and timing results in a few representa-tive problems, this section’s focus is solely on the amount ofsuboptimality resulting from the group expansion. Our ﬁgure Brieﬂy, we require that σ ∗ is a limit of paths with bounded clearancefrom X obs ; this may be regarded as a minimum regularity assumption to guardagainst problems with passages of inﬁnitesimal width that are not amenableto sampling-based motion planning. f merit is thus only the percentage cost increase compared toFMT ∗ , i.e., from the group expansion.Table I lists results for two geometric planning problemsover a variety of dimensions (2D to 10D), the ﬁrst of which isshown in Fig. 2. This obstacle set was mapped to dimensionsgreater than two by expanding obstacles to fully ﬁll thespace. Fig. 2b, in particular, shows the wave-like structure ofthe parallel expansion. The second planning problem listedin Table I is a maze environment requiring exploration inall dimensions. For each planning problem, the same setupis run with varying λ (values of . , . , and . ) andwith sufﬁciently high sample counts to nearly converge tothe optimal. In each case, the suboptimality is signiﬁcantlybelow the proven bound, and often below 5%. We observethe expected increase in cost error with increasing λ , and anadditional increase with increasing dimension. (a) (b) Fig. 2: Expansion of the GMT ∗ algorithm, where Fig. 2ashows the resulting tree (colored by cost) and Fig. 2b showsindividual groups (denoted by color) expanded in parallel. Cost Error ( c GMT ∗ /c FMT ∗ − )Obstacle d λ = 0 . λ = 0 . λ = 1 . Rectangles 2D 0.2% 0.6% 1.8%(Fig. 2a) 3D 0.1% 0.6% 3.4%6D 0.4% 1.5% 2.1%10D 2.0% 14.8% 17.0%Maze 3D 0.3% 1.5% 4.7%5D 0.9% 4.9% 7.8%

TABLE I: Suboptimality introduced by GMT ∗ over FMT ∗ fora range of dimensions d and group cost threshold factors λ .Results are averaged over 50 runs at n = 5000 samples. Thevariance was found to be small.IV. GPU I MPLEMENTATION

We begin this section with a brief discussion of GPUarchitectures, as the ability of GMT ∗ to exploit the compu-tational capabilities of many-core GPUs is fundamental tothis work and has driven much of the algorithm design andimplementation. We particularly focus here on the CUDAenabled GPUs used in this work. CUDA C functions runningon GPUs are organized in a three level thread-hierarchy. Atthe lowest level, threads run in groups of 32 that execute onecommon instruction at a time, i.e., any divergence will causebranches to execute serially. A level above this, thread groupsare combined into blocks, each of which can utilize a small,low-latency shared memory block and executes concurrentlyon the same multiprocessor, but is allowed to diverge withoutcausing serial execution. Finally, at the highest level, blocks are formed in grids to be dispatched to the device. A moredetailed discussion can be found in [26].We highlight here three properties of GMT ∗ that, alongwith the sample-level parallelism, allow efﬁcient applicationto GPU architectures. First, the use of lazy collision checkinglimits thread divergence at low levels by only attempting toconnect new samples to the tree once per iteration. Second,the design of GMT ∗ is such that the sample set is partitionedinto V unexplored , V open , and V closed , with a sample always amember of one and only one set, allowing for little overlap ofmemory access and easy memory representation as Booleanmasks. Our work accesses these sets with thread identiﬁersassigned via preﬁx sums, a strategy described in [27]. Theuse of this algorithmic primitive allows fast reorganization ofsparse and uneven workloads into dense uniform ones. Third,as the set of samples considered for expansion, G , can berepresented as a set of cost-thresholded buckets, there is noneed for the use of serial data structures, e.g., min-heaps.V. N UMERICAL E XPERIMENTS

A. Numerical Experiment Setup

As our goal is to show planning in changing, uncertainsettings with dynamic systems, in this section we apply GMT ∗ to the problem of planning under differential constraints witha 6D double integrator ( ¨ x = u ) and a Dubins airplane(Dubins car with altitude [28]). The double integrator planningproblems consider a mixed time/quadratic control effort costfunction, while the Dubins airplane problems consider anEuclidean distance cost function. The algorithm was imple-mented in CUDA C (example code may be accessed at github.com/StanfordASL/GMT) and run on an NVIDIA GeForceGTX 980 GPU on a Unix system with a 3.0 GHz CPU. Weadditionally provide comparison with an embeddable GPU,the NVIDIA Jetson TX1, to show these performance gains aresimilarly available for onboard computation. Our implementa-tion of GMT ∗ samples the state space using the deterministic,low-dispersion Halton sequence, to achieve best performance,following the discussion in [24]. Sampling and computation ofnearest neighbor connections (edge discretization and neighborsets) were performed ofﬂine in a precomputation phase. B. Planning Under Differential Constraints

Through several motion planning problems, detailed inFig. 3 and Table II, we demonstrate GMT ∗ achieves oneto two orders of magnitude speed up with relatively smallperformance losses compared to an implementation of FMT ∗ on a CPU [8] and an implementation of PRM ∗ on a GPU[6]. For each simulation, we pick a value for the connectionradius r appropriate for the dynamics [3][4] and set λ to 1,which we have found allows simple implementation, max-imum parallelization, and performance losses on the orderof 10%. The obstacles in our simulation are represented byunions of axis-aligned bounding boxes, as commonly used fora broad phase collision checking phase [2]. This methodologycan provide increasingly accurate representations of obstaclesets as more are used (e.g., as in Fig. 3b or with octree-basedrepresentations as in Fig. 3c).The ﬁrst problem (Fig. 3b) was built from point clouddata collected in [29] for an indoor ofﬁce environment, withndividual environmental elements bounded by boxes. Thesecond planning problem (Fig. 3c) represents a cave systemconsisting of two maze-like levels connected by three passage-ways. Finally, our third planning problem (Fig. 3d) represents aforest environment. To show the results extend to systems withnonlinear dynamics (and nonholonomic planning), this lastsimulation uses Dubins airplane dynamics rather than doubleintegrator dynamics. Our Dubins airplane consists of a planarDubins car augmented with a single integrator in the thirddimension, with bounded control on turning rate, unboundedaltitude control, and a Euclidean distance cost function [28].As shown in Table II, in all problems a solution trajectoryis found by GMT ∗ in ~10 ms, a speed increase of two ordersof magnitude over CPU FMT ∗ and one order of magnitudeover GPU PRM ∗ . The algorithm also performs well on theembedded platform, only slowing by a factor of two, comparedto PRM ∗ , which slows down by approximately a factor of ﬁve.This demonstrates GMT ∗ ’s lightweight approach of buildinga single tree is particularly amenable to onboard computation.We also note that the cost increase incurred is less than 12%for all cases despite the high group cost threshold factor. (a) (b)(c) (d) Fig. 3: The solution trajectory connecting x init (blue) to X goal (red) returned by GMT ∗ is shown in green for all ﬁgures.(3a-3b) The indoor environment was generated with pointcloud data from [29] with individual elements bounded byboxes. (3c) The cave environment consists of two maze-likelevels, with wall outlines shown in black, connected by threepassageways, shown in blue. (3d) The forest environmentconsists of many varying tree obstacles.Table III further demonstrates the algorithm’s scaling withsample and obstacle counts. The small increases in compu-tation time with increasing sample count are a result of theGPU not being fully utilized at every iteration with lowersample counts, i.e., the group size may be too small to useevery GPU core. The obstacle scaling too shows only slightincreases in computation time with increased obstacle reso-lution, approximately doubling for each order of magnitude increase, however, if obstacles and complexity of the spacebecomes a signiﬁcant bottleneck, GMT ∗ is amenable to spacepartitioning structures, e.g., k-d trees, or parallelization at theobstacle level [20]. Double Integrator, Fig. 3bAlgorithm Device c alg /c GMT ∗ Time (ms) t alg /t GMT ∗ FMT ∗ CPU 0.91 1291 129.1PRM ∗ Embd. GPU 0.88 735 73.5PRM ∗ GPU 0.88 158 15.8GMT ∗ Embd. GPU 1 27 2.7GMT ∗ GPU 1 10 1Double Integrator, Fig. 3cAlgorithm Device c alg /c GMT ∗ Time (ms) t alg /t GMT ∗ FMT ∗ CPU 0.91 1490 99.3PRM ∗ Embd. GPU 0.90 517 34.5PRM ∗ GPU 0.90 140 9.3GMT ∗ Embd. GPU 1 31 2.0GMT ∗ GPU 1 15 1.0Dubins Airplane, Fig. 3dAlgorithm Device c alg /c GMT ∗ Time (ms) t alg /t GMT ∗ FMT ∗ CPU 0.95 1312 218.7PRM ∗ Embd. GPU 0.95 945 157.5PRM ∗ GPU 0.95 96 16.0GMT ∗ Embd. GPU 1 11 1.8GMT ∗ GPU 1 6 1

TABLE II: Results for algorithms run with 5000 samples inthe environments in Fig. 3. GPU refers to the GTX 980, whileEmbd. GPU refers to an embeddable Jetson TX1 GPU.

Fig. Sample Count ( n ) Obstacle Count Time (ms) Cost3c 1k 300 6 5522k 300 8 3795k 300 15 29010k 300 21 2333d 5k 150 7 -5k 500 14 -5k 1500 18 -5k 5000 26 - TABLE III: Results for GMT ∗ scaling with sample and obsta-cle counts, in terms of cost and computation time. Note theobstacle count represents varying the resolution of the obstaclerepresentations, not varying the obstacle location or class. C. Planning in the Loop

The numerical experiments in Section V-B have shownthat it is possible to plan at rates amenable to implemen-tation within control loops by allowing some performanceloss in exchange for parallelism. We now demonstrate thatthis strategy is beneﬁcial through numerical experiments fora system operating in a dynamic environment with randomstate disturbances. The setup mimics a collapsing cave system(Fig. 3c), which a quadrotor modeled as a double integratormust escape. A successful escape requires high performanceactions to minimize time spent in the degrading cave as well asactions that account for state disturbances and variations in theenvironment. The collapse is modeled as randomly placed boxobstacles added to the environment, with the rates representingthe number of obstacles added each second. Fig. 4 shows thesuccess rate over 50 runs of a quadrotor using a waypointtracking controller, which tracks trajectories generated withFMT ∗ and GMT ∗ (replanning as quickly as possible). Theresults for FMT ∗ show that, as expected, success rate decreasesquickly with increased noise and environmental degradation.eplanning with GMT ∗ , however, shows little variation infailure rate with increased noise level. The results further showsigniﬁcantly higher success rates for replanning with GMT ∗ than replanning with FMT ∗ . Note that these planning problemsmay not be possible to solve for every instance, as thecollapses can happen anywhere within the environment (andvery quickly), potentially trapping the quadrotor. Experimentvideos are available at https://goo.gl/67RSsp.Fig. 4: Success rate of a quadrotor replanning with FMT ∗ orGMT ∗ in the Fig. 3c environment with varying levels of statedisturbances and cave collapse rates (simulated with obstaclesrandomly placed in the environment).VI. C ONCLUSION

We have introduced and analyzed a novel planning al-gorithm, the Group Marching Tree algorithm (GMT ∗ ), thattrades off parallelism for optimality in order to leverage GPUhardware. The computational speed of GMT ∗ allows us toapproach the problem of planning in real-world settings—particularly focusing on the uncertain, dynamic environmentsthat naturally arise from active robot sensing and the uncertain,disturbed motion of systems in the ﬁeld—by replanning atrates commensurate with the control loop frequency. Simula-tion results show planning times on the order of 10 ms (fora 6D double integrator and Dubins airplane) and demonstratethe efﬁcacy of planning at these rates in difﬁcult environments.This paper leaves several important research avenues open.Foremost, we plan to validate this approach experimentally ona platform with state and environmental sensing. We furtherplan to provide a more detailed theoretical analysis of GMT ∗ ,such as providing time and space complexity analysis and po-tentially proving tighter suboptimality bounds. We additionallyplan to explore extensions to other planning paradigms, whichthe computational speed of GMT ∗ may enable. To mergeplanning and game playing, we plan to use GMT ∗ as a defaultsimulation policy when many actions must be considered, suchas in algorithms like Monte Carlo tree search. We also plan toshow that GMT ∗ may be used to construct policies in decisionmaking frameworks through fast approximation of the cost-to-go. Finally, we plan to demonstrate extensions to planningwith a probabilistic state belief by utilizing a backwards searchin cost-to-go space. In this way we can deﬁne actions overregions of the state space, with the same computation timesshown above, and select actions from criteria such as bestworst-case or maximum expected performance. R EFERENCES[1] S. M. LaValle, “Motion planning: Wild frontiers,”

IEEE Robotics andAutomation Magazine , 2011.[2] ——,

Planning Algorithms . Cambridge University Press, 2006.[3] E. Schmerling, L. Janson, and M. Pavone, “Optimal sampling-basedmotion planning under differential constraints: the driftless case,” in

Proc. IEEE Conf. on Robotics and Automation , 2015, extended versionavailable at http://arxiv.org/abs/1403.2483/.[4] ——, “Optimal sampling-based motion planning under differential con-straints: the drift case with linear afﬁne dynamics,” in

Proc. IEEE Conf.on Decision and Control , 2015.[5] L. E. Kavraki, P. ˇSvestka, J.-C. Latombe, and M. H. Overmars, “Prob-abilistic roadmaps for path planning in high-dimensional spaces,”

IEEETransactions on Robotics and Automation , 1996.[6] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimalmotion planning,”

Int. Journal of Robotics Research , 2011.[7] B. R. and L. Kavraki, “Path planning using lazy PRM,” in

Proc. IEEEConf. on Robotics and Automation , 2000.[8] L. Janson, E. Schmerling, A. Clark, and M. Pavone, “Fast MarchingTree: a fast marching sampling-based method for optimal motion plan-ning in many dimensions,”

Int. Journal of Robotics Research , 2015,2015.[9] S. R. Lindemann and S. M. LaValle, “Simple and efﬁcient algorithmsfor computing smooth, collision-free feedback laws over given celldecompositions,”

Int. Journal of Robotics Research , 2009.[10] W. Sun, S. Patil, and R. Alterovitz, “High-frequency replanning un-der uncertainty using parallel sampling-based motion planning,”

IEEETransactions on Robotics , 2015.[11] M. Otte and E. Frazzoli, “RRT X : Asymptotically optimal single-querysampling-based motion planning with quick replanning,” Int. Journal ofRobotics Research , 2016.[12] A. Majumdar and R. Tedrake, “Robust online motion planning with re-gions of ﬁnite time invariance,” in

Workshop on Algorithmic Foundationsof Robotics , 2012.[13] C. Richter, W. Vega-Brown, and N. Roy, “Bayesian learning for safehigh-speed navigation in unknown environments,” in

Int. Symp. onRobotics Research , 2015.[14] S. Scherer, S. Singh, L. Chamberlain, and M. Elgersma, “Flying fastand low among obstacles: Methodology and experiments,”

Int. Journalof Robotics Research , 2008.[15] R. B. Dial, “Algorithm 360: Shortest-path forest with topological order-ing [h],”

Communications of the ACM , 1969.[16] U. Meyer and P. Sanders, “ ∆ -stepping: a parallelizable shortest pathalgorithm,” Journal of Algorithms , 2003.[17] S. Kim, “An O ( N ) level set method for eikonal equations,” SIAMJournal on Scientiﬁc Computing , 2001.[18] N. M. Amato and L. K. Dale, “Probabilistic roadmap methods are em-barrassingly parallel,” in

Proc. IEEE Conf. on Robotics and Automation ,1999.[19] J. Pan, C. Lauterbach, and D. Manocha, “g-planner: Real-time motionplanning and global navigation using GPUs.” in

Proc. AAAI Conf. onArtiﬁcial Intelligence , 2010.[20] J. Bialkowski, S. Karaman, and E. Frazzoli, “Massively parallelizing theRRT and the RRT*,” in

IEEE/RSJ Int. Conf. on Intelligent Robots &Systems , 2011.[21] D. Devaurs, T. Sim´eon, and J. Cort´es, “Parallelizing RRT on large-scale distributed-memory architectures,”

IEEE Transactions on Robotics ,2013.[22] C. Rodriguez, J. Denny, S. A. Jacobs, S. Thomas, and N. M. Amato,“Blind RRT: A probabilistically complete distributed RRT,” in

IEEE/RSJInt. Conf. on Intelligent Robots & Systems , 2013.[23] A. Fidel, S. A. Jacobs, S. Sharma, N. M. Amato, and L. Rauchwerger,“Using load balancing to scalably parallelize sampling-based motionplanning algorithms,” in

IEEE Parallel and Distributed ProcessingSymposium , 2014.[24] L. Janson, B. Ichter, and M. Pavone, “Deterministic sampling-basedmotion planning: Optimality, complexity, and performance,”

Int. Journalof Robotics Research , 2015, conditionally Accepted.[25] J. A. Starek, J. V. Gomez, E. Schmerling, L. Janson, L. Moreno, andM. Pavone, “An asymptotically-optimal sampling-based algorithm forbi-directional motion planning,” in

IEEE/RSJ Int. Conf. on IntelligentRobots & Systems , 2015.[26] NVIDIA,

CUDA C Programming Guide , 2016.[27] D. Merrill, M. Garland, and A. Grimshaw, “Scalable GPU graphtraversal,” in

ACM SIGPLAN Notices , 2012.[28] H. Chitsaz and S. M. LaValle, “Time-optimal paths for a dubinsairplane,” in

Proc. IEEE Conf. on Decision and Control , 2007.[29] I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, andS. Savarese, “3D semantic parsing of large-scale indoor spaces,” in