[PDF] Manipulation Planning Among Movable Obstacles Using Physics-Based Adaptive Motion Primitives

Abstract

Robot manipulation in cluttered scenes often requires contact-rich interactions with objects. It can be more economical to interact via non-prehensile actions, for example, push through other objects to get to the desired grasp pose, instead of deliberate prehensile rearrangement of the scene. For each object in a scene, depending on its properties, the robot may or may not be allowed to make contact with, tilt, or topple it. To ensure that these constraints are satisfied during non-prehensile interactions, a planner can query a physics-based simulator to evaluate the complex multi-body interactions caused by robot actions. Unfortunately, it is infeasible to query the simulator for thousands of actions that need to be evaluated in a typical planning problem as each simulation is time-consuming. In this work, we show that (i) manipulation tasks (specifically pick-and-place style tasks from a tabletop or a refrigerator) can often be solved by restricting robot-object interactions to adaptive motion primitives in a plan, (ii) these actions can be incorporated as subgoals within a multi-heuristic search framework, and (iii) limiting interactions to these actions can help reduce the time spent querying the simulator during planning by up to 40x in comparison to baseline algorithms. Our algorithm is evaluated in simulation and in the real-world on a PR2 robot using PyBullet as our physics-based simulator. Supplementary video: \url{this https URL}.

Full PDF

MManipulation Planning Among Movable Obstacles Using Physics-BasedAdaptive Motion Primitives

Dhruv Mauria Saxena*, Muhammad Suhail Saleem*, and Maxim Likhachev

Abstract — Robot manipulation in cluttered scenes often re-quires contact-rich interactions with objects. It can be moreeconomical to interact via non-prehensile actions, for example,push through other objects to get to the desired grasp pose,instead of deliberate prehensile rearrangement of the scene.For each object in a scene, depending on its properties, therobot may or may not be allowed to make contact with, tilt, ortopple it. To ensure that these constraints are satisﬁed duringnon-prehensile interactions, a planner can query a physics-based simulator to evaluate the complex multi-body interactionscaused by robot actions. Unfortunately, it is infeasible to querythe simulator for thousands of actions that need to be evaluatedin a typical planning problem as each simulation is time-consuming. In this work, we show that (i) manipulation tasks(speciﬁcally pick-and-place style tasks from a tabletop or arefrigerator) can often be solved by restricting robot-objectinteractions to adaptive motion primitives in a plan, (ii) theseactions can be incorporated as subgoals within a multi-heuristicsearch framework, and (iii) limiting interactions to these actionscan help reduce the time spent querying the simulator duringplanning by up to × in comparison to baseline algorithms.Our algorithm is evaluated in simulation and in the real-worldon a PR2 robot using PyBullet as our physics-based simulator.Supplementary video: https://youtu.be/ABQc7JbeJPM . I. I

NTRODUCTION

Manipulation planning problems in domestic households,industrial manufacturing and warehouses require contact-richinteractions between a robot and the objects in the envi-ronment. As the amount of clutter in a scene increases, thelikelihood of ﬁnding a completely collision-free trajectoryfor the manipulator decreases. This does not mean the taskis impossible since we might still be able to complete it bymoving the objects around. In these cases, non-prehensileinteractions with objects can be much faster than deliberatelyrearranging the scene via a sequence of slow pick-and-placestyle prehensile maneuvers. In addition, each object in acluttered scene is associated with constraints that deﬁne howa robot can manipulate it. For example, while we might beallowed to interact freely with a box of sugar, we might notbe allowed to tilt or topple a cup of coffee.We want to enable robots to grasp in clutter by using non-prehensile actions to interact with objects while satisfyingany object-centric constraints, e.g. constraints that dictatewhether or not we can make contact with, tilt, or topplean object. This domain of Manipulation Among Movable *Dhruv Mauria Saxena and Muhammad Suhail Saleem contributedequally to this work.The authors are with the Robotics Institute, Carnegie Mellon Univer-sity, Pittsburgh, PA 15213, USA. e-mail: { dsaxena, msaleem2,mlikhach } @andrew.cmu.edu Fig. 1. Manipulation tasks in cluttered tabletop (left) or refrigerator (right)workspaces require planners to account for complex multi-body interactionsbetween the robot and objects (green movable objects and red immovableobstacles). The goal is to pickup a randomly selected immovable obstacle.

Obstacles (

MAMO ) [1] is derived from prior work onNavigation Among Movable Obstacles (

NAMO ) [2], [3].Planning problems for

NAMO aim to ﬁnd a feasible pathbetween start and goal states for a mobile robot navigatingin an environment with reconﬁgurable obstacles . We focuson the class of MAMO problems where the goal for the robotmanipulator is a 6D pre-grasp pose of an object in SE (3) without any constraints on the ﬁnal poses of the movableobjects. The task of planning the grasp itself is not addressedby our algorithm.Motion planning for MAMO is computationally costlybecause of two major challenges. First, we need to accu-rately model the dynamics of the robot-object and object-object interactions in the environment during planning. Thisrequires the use of a high-ﬁdelity physics-based simulatorsince hand-designed analytical models are hard to generalisefor complex object geometries and cluttered scenes. Thesimulator is used to model the environment and predictthe outcome of actions. The computational cost associatedwith running a simulator is high, which makes it infeasibleto query the simulator for every action that needs to beevaluated. The second challenge is associated with the searchspace of the problem. Since the robot may interact withobjects in the scene and reconﬁgure them, the search spaceneeds to include the conﬁguration space of all these objects.This makes the search space for a planning problem in thisdomain grow exponentially with the number of objects, andmakes it computationally hard to ﬁnd a solution.In this work, we make an observation that many

MAMO problems can be solved effectively by restricting robot-object In our work ‘objects’ may be movable, but ‘obstacles’ are immovable. a r X i v : . [ c s . R O ] F e b nteractions to adaptive motion primitives and show how thisobservation can be exploited to structure an efﬁcient searchfor a contact-rich motion. Adaptive motion primitives ( AMP s,Section IV-B) are long-range actions generated on-the-ﬂysuch that they terminate in a valid goal state [4]. In ourdomain, they are straight lines in the conﬁguration space ofthe robot, between two states whose end-effector Cartesiancoordinates are within δ of each other in Euclidean norm.Since the goal in our domain is deﬁned in the workspaceof the robot, an AMP is computed by linearly interpolatingbetween a state that satisﬁes the δ threshold condition andan inverse kinematics (IK) solution of the goal pose.Our central assumption in this work limits robot-objectinteractions to the ﬁnal action (an AMP ) in a plan. Thisrestriction limits the class of

MAMO problems solvableby our algorithm to ones that require at most one robotaction near the goal to make contact with the objects inthe scene. We test our algorithm on random initialisationsof the cluttered tabletop and refrigerator scenes from Fig. 1.Empirically our results in Section VI show that even withthis restriction, our algorithm solves many

MAMO planningproblems and is up to × faster than competitive baselinesin our experiments.Our main contributions towards robot manipulation plan-ning among movable obstacles include: • a two-stage planning approach where we ﬁrst parallelysample promising AMP s, and then systematically usethem in our planning algorithm to ﬁnd a solution. • the use of these AMP s as subgoals within a multi-heuristic search algorithm. • an action evaluation scheme that minimises the timespent querying a simulator during planning.II. R ELATED W ORK

The domain of Manipulation Among Movable Obstacles(

MAMO ) is closely tied to prior work in the ﬁeld ofNavigation Among Movable Obstacles (

NAMO ) [2], [3].Past works in the

MAMO domain have taken one of twopopular approaches - either solving problems via a sequenceof pick-and-place style maneuvers [1], or limiting solutionsto only planar robot-object interactions [5], [6], [7]. Therearrangement planning problem was extensively studied byKing [8] in their thesis which focused on non-prehensileinteractions.In this work we use a physics-based simulator in-the-loop during planning to account for dynamics of robot-object and object-object interactions in the scene. Plakuet. al. [9] decompose the planning problem into a high-level discrete space, and a low-level sampling-based plannerwith a complex dynamics model (physics-based simulator).Zickler and Veloso [10] attempt to solve physics-basedplanning problems with the help of high-level, long-rangerobot behaviours. Dogar et. al. [11] simulate and cachemultiple robot-object interactions, and use their result duringplanning to ﬁnd feasible solutions. However, they do not al-low any object-object interactions, which can be unavoidablein cluttered

MAMO scenes. Similar to our work, the idea of planning till the proximity of the goal and using a moreexpensive specialized maneuver from within this proximitycan be seen in [12] in the context of grasp planning.Most relevant to our work in this paper are: a sampling-based planning algorithm KPIECE [13] and a search-basedplanning algorithm Selective Simulation [14]. Each usesdifferent approaches to incorporate a simulator in-the-loopduring planning. KPIECE is a sampling-based planner forapplications with complex dynamics that uses an importancefunction over discretised cells of the robot workspace toguide exploration, but calls the simulator for all action eval-uations. This is very computationally expensive for

MAMO and in our experimental comparisons with KPIECE we showthat intelligently limiting the number of simulator queriescan improve quantitative performance. Selective Simulationiteratively plans with simulations in a reduced search space(accounting for some objects) and executes the plan in asimulator (with all objects). If any object constraints areviolated upon execution, it decides on one of these objects tobe added to the search space for the next planning iteration.However, in the original paper Selective Simulation wasevaluated for simple constraints of contact or toppling offthe table. In addition to these, we consider constraints onhow far obstacles can be tilted and how much velocity canbe imparted to them. Selective Simulation is also proneto repeated simulations of similar actions which is timeconsuming and something we explicitly account for in ourwork with soft duplicate detection. Our experimental analysisalso includes comparison with Selective Simulation.III. P

ROBLEM F ORMULATION

A. Search Space

In this work, we denote the robot manipulator as R , and X R ⊂ R q as the conﬁguration space for a q degrees-of-freedom (dofs) manipulator. The set of objects in the sceneis O = { O , . . . , O n } , and the conﬁguration of any object X O i ∈ SE (3) includes the 3D position and orientation.The robot is equipped with a set of actions A . The searchspace for a planning problem in the MAMO domain is theCartesian product of the robot and all object conﬁgurationspaces, denoted as X = X R × X O × · · · × X O n . B. Object Constraints

Each object O i is associated with m i constraint functions k ji : X → { , } , ≤ j ≤ m i which return 1 if constraint k ji is satisﬁed, 0 otherwise. For example, an immovable obstacle(an object that is not allowed to be interacted with, forexample, a wall) will contain a constraint function whichevaluates to 1 so long as neither the robot nor any otherobject makes contact with it. We denote the evaluation of allconstraints of an object as K i ( x ) = (cid:2) k i ( x ) , . . . , k m i i ( x ) (cid:3) . Ifall constraints are satisﬁed, K i ( x ) = m i .We say a state is valid if all constraints for all objects aresatisﬁed at that state . Formally, state x is valid if K i ( x ) = We omit the necessary robot kinematic and dynamic feasibility con-straints from the deﬁnition of state validity for brevity. m i ∀ O i ∈ O , i ∈ { , . . . , n } . We denote the space of validstates as X V . C. Problem Statement

A planning problem in the

MAMO domain can be deﬁnedas the tuple P = ( X , A , x S , X G , T , c ) where x S ∈ X V isthe start state, X G ⊂ X , X G ∩ X V (cid:54) = ∅ is the set of goalconﬁgurations, T : X × A → X is a deterministic transitionfunction, and c : X × X → R ≥ is a state transition cost-function. We assume X G is deﬁned using a desired end-effector pose in SE (3) (a grasp location for an object; graspplanning is outside the scope of this problem), which leads toa set of possible manipulator conﬁgurations, some of whichmust be valid ( X G ∩ X V (cid:54) = ∅ ) .A path π of length T is a sequence of states { x , . . . , x T } and has cost C ( π ) = (cid:80) T − i =1 c ( x i , x i +1 ) where x i +1 = T ( x i , a i ) . The goal for a MAMO planning problem is toﬁnd a valid path from start to goal, i.e. a path made up of asequence of valid states. Formally, we can write this as anoptimisation problem:ﬁnd π ∗ = arg min π C ( π ) s.t. x ∈ X V , ∀ x ∈ π (path of valid states) x = x S , x T ∈ X G (start, goal constraints) x i +1 = T ( x i , a i ) , a i ∈ A , ∀ x i , x i +1 ∈ π (transition dynamics) D. Graph Representation

We solve

MAMO planning problems using a search-based planning algorithm in X . Our graph representationcontains two types of actions in A - simple primitives and adaptive motion primitives ( AMP s). Each simple primitivechanges one joint angle of a robot by a small amount . Incomparison, an AMP is computed on the ﬂy and can changeall coordinates in X R . A consequence of our core assumption(Section IV-C) is that for simple primitives a s ∈ A , avalid transition x (cid:48) = T ( x, a s ) implies that the state x andthe successor state x (cid:48) only differ in the robot conﬁguration(in X R ). For AMP s a AMP ∈ A , a valid transition x (cid:48) = T ( x, a AMP ) can lead to differences in object conﬁgurations( X O × · · · × X O n ) in addition to a difference in X R .IV. A PPROACH

A. Action Evaluation

Following the ideas outlined in [14], we decompose ouraction evaluation scheme into a relatively fast collisionchecking routine and a much slower physics-based simu-lation. Collision checking involves checking for volumetricoverlaps between the collision models of the robot andobjects. This is computationally a relatively cheap operationthat can be easily implemented with the use of a distanceﬁeld. If and when necessary, an action that passes thiscollision checking phase might need to be simulated todetermine whether or not it violates any object constraints. In our implementation, ◦ or ◦ depending on the joint in our imple-mentation. Movable Object Immovable Obstacle

Fig. 2. For a particular goal state conﬁguration x G ∈ X R , we can generate AMP s from several states x i within distance δ from it. This δ -sphere inconﬁguration space X R might be occupied by both movable (green) objectsand immovable (red) obstacles. This can lead to invalid actions a , a , validaction a , and Phase 1 valid action a whose Phase 2 validity will bedetermined after simulation. The set of objects O in a scene can be separated into twosubsets - movable objects O M that the robot is allowed tointeract with, and immovable obstacles O I = O \ O M . Weassume that this separation is known a priori. Deﬁnition 1 (Phase 1 validity) . We say an action a ∈ A from state x ∈ X V is Phase 1 valid if it does not makecontact with any immovable obstacle O ∈ O I . Deﬁnition 2 (Phase 2 validity) . We say an action a ∈ A from state x ∈ X V is Phase 2 valid if it is Phase 1 valid andit does not result in any object constraint violations.The fast collision checking routine is used to determinePhase 1 validity of an action as it can quickly detect overlapswith immovable obstacles. This can also determine Phase 2validity if there is no overlap with any object. Note thatfor a ∈ A , x ∈ X V , x (cid:48) = T ( x, a ) , we call the collisionchecking routine for all intermediate states between x and x (cid:48) , including x (cid:48) but not x . In the case when an action is Phase1 valid, but also makes contact with some movable object(s),Phase 2 validity can only be determined after simulating it.This is because we need to account for the complex multi-body interactions that might result upon executing the action.These interactions might violate object constraints due toa movable object-immovable obstacle contact, or the robotviolating other movable object constraints such as tilting ortoppling. We emphasise that determining Phase 1 validity ofan action is computationally much cheaper (around peraction evaluation for a 7 dof manipulator) than determiningPhase 2 validity which requires simulating it (e.g., around . per action of a 7 dof manipulator in PyBullet). B. Adaptive Motion Primitives

Adaptive motion primitives (

AMP s) are IK-based motionprimitives that are generated on-the-ﬂy as part of our algo-rithm [4] and included in A . For any robot conﬁguration x ∈ X R , if the robot’s 3D Cartesian end-effector pose iswithin δ = 0 . of the goal end-effector pose X G , wegenerate an AMP that tries to connect x to X G . This is doneby obtaining an IK solution x G ∈ X R for X G , and linearlyinterpolating between x and x G . We choose this value of ovable ObjectImmovable ObstacleStartGoal Fig. 3. Consider planning between start and goal conﬁgurations shown onthe left. A “greedy” shortest-path search algorithm in the

MAMO domainwould proceed along the dotted path, exploring states in the light blueregion, and spend a lot of time simulating interactions with object O nearthe purple state v . Due to the assumptions we make, our search algorithmSPAMP proceeds along the dashed path via orange states u , and only startssimulating AMP s from states beyond u (cid:48) in the δ -sphere around goal x G . δ based on basic domain knowledge like the size of ourworkspaces and obstacles therein. We did not tune this valueto improve performance.The validity of an AMP (Phase 1 or Phase 2) is dependenton checking all interpolated states between x and x G . Fig. 2shows our action evaluation strategy from Section IV-A for AMP s. Since

AMP s terminate in a goal state, they can onlybe included in valid paths as the ﬁnal action.

C. Assumptions

We make one assumption for solving

MAMO planningproblems in this work which is closely related to

AMP s andtheir use in our search algorithm. In this subsection we hopeto provide an intuitive justiﬁcation for this assumption.

Assumption 1.

We only need to simulate

AMP s to ﬁnd avalid solution for a

MAMO planning problem.

For grasping and reaching in cluttered scenes like thosewe consider in this work, there is a large volume of object-free space between the start conﬁguration and goal region.Interactions with objects are necessary when the robot is ina region with a high degree of clutter, and clutter near thegoal is often the most pertinent for ﬁnding a feasible plan. inthe tabletop and refrigerator workspaces in our work. Basedon this observation we delay interacting with objects untilthe robot reaches a conﬁguration near the goal.For a preset value of δ , we restrict robot-object interactionsuntil the end-effector is within δ of the goal pose. Since AMP s are long-range actions contained inside this δ -sphere,interactions that are vital to the success of a plan are oftenthe terminal AMP s in a plan (in comparison to the short-range simple primitives which do not lead to meaningfulinteractions). This leads us to Assumption 1. Consequently,since

AMP s are terminal actions, the valid solution pathswe ﬁnd only contain a single action which interacts withobstacles. It is important to note that restricting interactionsto a single action does not limit the number of objectsthe robot can interact with. We illustrate the effect ofAssumption 1 in Fig. 3 by comparing our algorithm againsta naive search algorithm.

Movable ObjectImmovable Obstacle

Fig. 4. In case there is no interaction-free path between start state x S andthe δ -sphere around any goal state x G , our assumption of only simulatingterminal AMP s will cause our algorithm to not return a solution. A greatervalue of δ in this case could make this problem solvable by our algorithm. This assumption restricts the space of

MAMO planningproblems solvable by our algorithm to those that requireat most one

AMP to interact with objects near the goalconﬁguration. Our success rates from Section VI suggest thatthis assumption is not restrictive for the high-clutter sceneswe consider in this work (Fig. 1). In cases where no suchconﬁguration near the goal is achievable by the manipulatoras shown in Fig. 4, our algorithm will fail to ﬁnd a solution.It might still be possible to ﬁnd solutions in these cases bydynamically changing δ to ﬁnd a valid AMP , but we havenot explored this yet.Assumption 1 helps us deal with the two major computa-tional challenges for

MAMO :1) The number of calls to the simulator go down signif-icantly, as we now only simulate terminal

AMP s thatinteract with movable objects as opposed to any actionthat interacts with movable objects.2) Since only terminal

AMP s might be simulated, we canplan from x S to inside the δ -sphere around some goalconﬁguration purely in X R . This means that the searchspace for ﬁnding a path from x S to x T − reduces from X = X R ×X O ×· · ·×X O n to X R , thereby tackling theissue of a prohibitively large search space in clutteredenvironments. D. Subgoals

We solve

MAMO planning problems using a search-basedplanning algorithm. The performance of these algorithms isdependent on the quality of the heuristic functions used. Asit is often infeasible to create a single heuristic that canperfectly guide the search from start to goal in all scenarios,it is common practice in high-dimensional spaces to usea multi-heuristic framework. Multiple heuristics guide thesearch along multiple promising directions, which can helpovercome local minima associated with any one heuristic.Given the fact that simulations are the computationalbottleneck in our domain, and the assumption that we onlysimulate

AMP s, it is helpful to guide a search-based planningalgorithm to regions of the search space where a valid

AMP likely exists. Our two-stage planning approach discussed inSection V ﬁrst ﬁnds

AMP s that are Phase 1 or Phase 2valid, and generates heuristic functions that guide the searchto the beginning of these actions. For an

AMP from state x T − ∈ X V , the corresponding heuristic function is a simpleEuclidean distance from x T − in X R . This ﬁrst stage is run ovable ObjectImmovable Obstacle Fig. 5. If an

AMP from any x l ∈ L is found to be invalid, we postponethe simulation of AMP s from any other state x that is within β distancefrom it by inﬂating the heuristic value of that action. in parallel across multiple simulator instances, one per AMP sampled. The second stage runs a multi-heuristic search toﬁnd a path from start state x S to a goal state in X G .The beginning of the AMP x T − can be thought of as a subgoal for the planner since we guide the search towardsit. While it is not necessary to reach the subgoal, exploringthe search space near it can help ﬁnd a solution.If an AMP a AMP is Phase 2 valid, and the subgoal x T − isreachable without making contact with any object, we do notneed to simulate actions on the way to x T − . It sufﬁces toﬁnd a collision-free path in X R from start x S to x T − , andappend x T = T ( x T − , a AMP ) for a valid MAMO solution.If we only have access to Phase 1 valid subgoals, we allowour planning algorithm to simulate any Phase 1 valid

AMP sthat are generated during the search. To avoid the rare casewhen a subgoal is Phase 2 valid and unreachable (perhapsdue to kinematic limits or a scenario like Fig. 4), we allowour planner to simulate Phase 1 valid

AMP s after time t has elapsed during planning (Algorithm 1, Line 4) . This isimplemented by maintaining a priority queue of all Phase 1valid AMP s generated by the search, and simulating them inorder after time t . E. Soft Duplicate Detection for Action Evaluation

Assumption 1 states that we only simulate Phase 1 valid

AMP s during planning. However, since the number of such

AMP s in cluttered environments can be very large, we furtheroptimise the calls to the simulator by employing a softduplicate action detection scheme [15].Soft duplicate detection estimates the similarity betweenan action that needs to be evaluated in simulation and anaction that has already been simulated and deemed invalid(one which violated object constraints). If they are verysimilar then it is likely that the new action would also violatesome of the same constraints and be invalid. The idea ofpreferring promising actions based on the validity of similaractions can also be found in [16].We maintain a list of states L from which an AMP hasbeen simulated and found to be invalid. For any new state x ∈ X V from which we ﬁnd a Phase 1 valid AMP , we ﬁrst We set t = 30s for our experiments. Algorithm 1

Simulation-based Planning with

AMP s (SPAMP)

Input:

Planning problem P , number of AMP subgoals N , numberof AMP samples M , simulation start time t sim , planning timeout t max Output: solution path π procedure SPAMP( P , N, M, t sim , t max ) H ← GetValidSubgoals ( N, M ) (cid:46) Phase 1 or Phase 2 valid subgoals. t ← t sim (cid:46) Simulations are allowed starting from time t . if | H | = 0 or | IsPhase2Valid ( H ) | = 0 then t ← (cid:46) |·| is the set cardinality operation. OP EN ← InitialiseHeuristics ( H ) π ← P LAN ( P , H, OP EN, t, t max ) return π compute its Euclidean distances in X R to states in L . Givena preset threshold β , if (cid:107) x − x l (cid:107) < β for any x l ∈ L wepostpone the simulation of the AMP from x by artiﬁciallyinﬂating the heuristic value of x and re-inserting it into thepriority queue it was expanded from. This ensures that x might be re-expanded from the same priority queue at a latertime, at which point we would simulate the AMP from it to agoal conﬁguration. Fig. 5 shows how this process implicitlycreates β -spheres in conﬁguration space around states thatlead to invalid AMP s due to any constraint violation.V. A

LGORITHM

Algorithm 1 is a high-level overview of our two-stageplanning pipeline. We call our algorithm Simulation-basedPlanning with

AMP s (SPAMP). The G ET V ALID S UBGOALS subroutine in Line 2 selects subgoals via rejection sampling. M Phase 1 valid

AMP s are randomly sampled and simu-lated in parallel. N Phase 2 valid subgoals are returned (ifavailable), else a combination of Phase 1 and Phase 2 validmake up the N returned subgoals (ﬁrst stage). This sectiongoes into more details about our speciﬁc planning algorithmfrom Line 8 of Algorithm 1 (second stage). A. Multi-Heuristic Framework for

MAMO

We use Multi-Heuristic A* (MHA*) [17] as our searchalgorithm in this work. MHA* maintains multiple priorityqueues, one for each heuristic that is used. It was originallydeveloped under the assumption that all action evaluationstake roughly the same amount of time. This meant priorityqueues could be selected round robin for state expansionsto equitably distribute computational resources across thequeues. However, our action evaluations have varying timecomplexity (checking for Phase 1 vs. Phase 2 validity).Since some queues might need many simulator calls, andsome queues might never query the simulator, a simpleround robin strategy would lead to an uneven distribution ofcomputational resources across the queues. For this reason,we prioritise state expansions from queues that the search hasspent the least time expanding states from thus far. This time-based prioritisation of queues in MHA* leads to a much moreequitable allocation of computational resources for

MAMO . . Planning Algorithm Algorithm 2 contains details from the preceding sectionsto provide a more in-depth look at our planning algorithm.Details of the MHA* planning algorithm can be found inthe original publication [17]. In our algorithm,

OP EN isa set of priority queues, each of which is deﬁned by aheuristic function. Following standard MHA* terminology,our anchor heuristic is a 3D Breadth-First Search (BFS)heuristic computed from the speciﬁed Cartesian goal end-effector position [18], taking into account the immovableobstacles in the scene. Every other heuristic is deﬁned by acorresponding

AMP subgoal as per Section IV-D. The actionset A is made up of the simple primitives and AMP s via thefunction

GenerateAMP . Simple primitives and

AMP s aredenoted by a s and a AMP respectively (Section III-D).SPAMP terminates if the next-best state to expand x isin the goal set or we have already found a Phase 2 valid AMP from it. For every other x , the E XPAND functiongenerates and evaluates all possible successor states of x .These necessarily include successors x (cid:48) = T ( x, a s ) ∀ a s ∈ A (Line 13). In addition, we may generate and evaluate an AMP from x to x G ∈ X R , an inverse-kinematics solution for X G (Line 22). This evaluation (Line 23 onwards) occurs if x passes the end-effector distance check (Line 17) and softduplicate check (Line 18). The IsInteraction functionreturns true if a AMP ‘collides’ with a movable object duringthe Phase 1 validity check (Line 24).VI. E

XPERIMENTAL R ESULTS

We run all our experiments on the PR2 robot and usePyBullet [19] as our physics-based simulator. We run ex-periments in two different workspaces - a tabletop and arefrigerator. The objects in a scene are divided into im-movable and movable subsets prior to planning. The robotis allowed to interact with the movable objects but cannottilt them excessively, cause them to fall over or outside theworkspace (table or refrigerator), or impart high velocities.Neither the robot nor any movable object can make contactwith immovable obstacles.

A. Comparative Quantitative Evaluation in Simulation

We compare the performance of our algorithm Simulation-based Planning with

AMP s (SPAMP) against relevant state-of-the-art baseline algorithms that can be applied to the

MAMO domain - KPIECE [13] and Selective Simulation(SS) [14], which were described in Section II. Three variantsof KPIECE were tested. The ﬁrst two use use the KPIECEimplementation from OMPL [20], but differ in how goalbiasing is implemented. K1 precomputes a set of 3 validgoal conﬁgurations by running IK before planning. K2 runsIK online (with random seeds) every time the search treeis grown towards the goal. K3 is our implementation ofKPIECE . The projective space for all KPIECE algorithms Per [13], we implement multiple levels of discretisation, goal biasing,and add all intermediate states of ‘motions’ to the search tree (somethingOMPL does not do). Additionally, we only simulate Phase 1 valid motions.

Algorithm 2

SPAMP procedure P LAN ( P , H, OP EN, t, t max ) ) Insert ( OP EN, x S ) (cid:46) Add to all queues. while OP EN is not empty do h ← BestQueue ( OP EN ) (cid:46) Time-based selection. x ← BestState ( h ) (cid:46) Pop from all queues. if x ∈ X G then return ExtractPath ( x ) if ∃ Phase 2 valid

AMP from x ∈ H then a AMP ← Phase 2 valid

AMP from x ∈ H return ExtractPath ( x ) ∪ {T ( x, a AMP ) } E XPAND ( x, OP EN, t ) procedure E XPAND ( x, OP EN, t ) for a s ∈ A do (cid:46) Simple primitives only. x (cid:48) ← T ( x, a ) . if x (cid:48) ∈ X V then Insert ( OP EN, x (cid:48) ) if (cid:107) FK ( x ) − X G (cid:107) ≤ δ then (cid:46)

3D end-effector poses. if x has soft duplicate then InflateHeuristic ( x ) Insert ( OP EN, x ) return x G ← IK ( X G ) (cid:46) Inverse kinematics. a AMP ← GenerateAMP ( x, x G ) if IsPhase1Valid ( a AMP ) then if IsInteraction ( a AMP ) and t elapsed > t then x (cid:48) ← Simulate ( a AMP ) if x (cid:48) ∈ X V then Insert ( OP EN, x (cid:48) ) else Insert ( L, x ) else if not Interaction ( a AMP ) then x (cid:48) ← T ( x, a AMP ) . Insert ( OP EN, x (cid:48) ) is the 3D Cartesian coordinate for the end-effector (via robotforward kinematics). In addition, we implemented a modiﬁedversion of Selective Simulation (SS2) which includes softduplicate detection on top of SS.A planning problem is initialised with objects selectedat random from the YCB Object Dataset [21]. Objects areplaced in the workspace at random. The goal for the PR2is to reach a pre-grasp pose for an immovable obstacle.Object masses are obtained from the YCB dataset, and theirPyBullet friction coefﬁcients are randomly sampled from theinterval [0 . , . as the dataset does not provide any frictioncoefﬁcient values. We run all planners on 180 randomlyinitialised planning problems in both workspaces.Table I shows quantitative results for all planners. We use12 objects on the tabletop (6 movable and 6 immovable), and5 objects in the refrigerator (3 movable and 2 immovable) .Sample initialisations of these workspaces are shown inFig. 1. SPAMP uses N = 3 subgoals from M = 8 samples in G ET V ALID S UBGOALS . All planners were givena maximum planning time of . We divide all planningproblems into two scenarios based on how long it takes aN

AIVE planner (a vanilla MHA* algorithm with only the 3DBFS heuristic to the goal) to solve them. N

AIVE simulates The tabletop is . × . , and refrigerator is . × . × . .ABLE IS IMULATION STUDY OF VARIOUS PLANNERS IN THE

MAMO

DOMAIN . Planning Algorithms

Tabletop Workspace (6 movable, 6 immovable) Refrigerator Workspace (3 movable, 2 immovable)

Metrics Scenario SPAMP

K1 K2 K3 SS SS2

SPAMP

K1 K2 K3 SS SS2

Success % Overall % % % % % % % % % % % % PlanningTime ( s ) Easy ± ±

301 339 ±

351 211 ±

369 6 ±

20 8 ± ± ±

293 210 ±

443 167 ±

222 7 ±

22 14 ± ± ±

310 464 ±

416 210 ±

327 22 ±

41 38 ± ± ±

119 128 ±

335 249 ±

425 57 ±

126 63 ± s ) Easy ± ±

301 339 ±

351 42 ±

88 4 ±

18 4 ± ± ±

293 208 ±

441 50 ±

89 3 ±

18 8 ± ± ±

310 463 ±

415 90 ±

131 4 ±

14 16 ± ± ±

118 137 ±

334 44 ±

92 20 ±

82 24 ± all Phase 1 valid AMP s. Problems solved by N

AIVE in lessthan are ‘Easy’, and the rest are ‘Difﬁcult’.SPAMP achieves the highest success rate across all algo-rithms which shows that our assumption from Section IV-Cis not too restrictive for the

MAMO domain. In terms ofplanning times, SPAMP is − × faster than KPIECEand KPIECE2, and − × faster than S EL S IM and S EL -S IM

2. S EL S IM is most competitive in terms of planningtimes, but it is still − × slower than SPAMP for difﬁcultplanning problems. By design, SPAMP spends most of thesimulation time ﬁnding valid subgoals for the search whichis still comparable to S EL S IM , and one or two orders ofmagnitude less than the other three baseline algorithms.KPIECE by default simulates all actions in the searchtree, spends almost all of its planning time in simulation,and in OMPL samples a random state in X R of thetime ( X R ⊂ R for a PR2 arm). Since each simulation takesaround . , and mostly random point-to-point exploration of X R (as implemented in OMPL) is wasteful, planning timesgrow quickly with the number of actions KPIECE evaluates. B. Runs on a Physical Robot

We setup the tabletop workspace experiment with thePR2 robot in our laboratory. We also set up a rudimentaryexperiment to calculate the coefﬁcient of static friction asthe tangent of the incline angle of the table at which theobjects start sliding. We used this friction coefﬁcient in oursimulator in an attempt to minimise the sim-to-real gap. Weselected 6 objects at random to initialise our scene and used asearch-based object localisation algorithm [22] on an NVidiaTITAN X GPU to detect the object poses for simulatorinitialisation. SPAMP was given a planning timeout, andobjects were instantiated in the simulator as movable with probability.The quantitative data from execution of 39 plans on thephysical robot is shown in Table II. A success rate of means that in 7 out of 39 of the executions, the robot violatedan obstacle constraint in the real-world. Since the plan foundby the robot must have been valid in simulation, constraintviolations in the real-world are due to a mismatch betweenthe simulator and the real-world. This could occur due tomodeling errors for the obstacles, execution errors on therobot, or perception errors in object localisation.A qualitative assessment of the 39 executions indicatesthat the two main sources of error in our experiment were

Fig. 6. Experimental setup for a PR2 robot in front of a tabletop workspacefor

MAMO . TABLE IIQ

UANTITATIVE P ERFORMANCE FOR R EAL -W ORLD E XPERIMENTS

MetricsAlgorithm

Success Rate Planning Time ( s ) Simulation Time ( s )SPAMP 82 % ± ± inaccurate friction coefﬁcients and object localisations. Fig. 6shows an image of our experimental setup. Our supplementalvideo submission also includes successful executions by thePR2 in a refrigerator and cabinet workspace. C. In-Depth Analysis of

SPAMP in Simulation

To get a better understanding of the quantitative perfor-mance of SPAMP, we conducted experiments to highlightthe effect of various components. For our ﬁrst experiment, wehighlight the effect of subgoals and soft duplicate detection.We consider four different planning algorithms - N

AIVE is avanilla MHA* algorithm with only the 3D BFS heuristic tothe goal; N

AIVE +DD uses soft duplicate detection on top ofthe N

AIVE planner, everything else being the same; S UB G TABLE IIIE

FFECT OF S UBGOALS AND S OFT DUPLICATE DETECTION

Metrics Scenario Planning Algorithms N AIVE N AIVE +DD S UB G S UB G+DDSuccessRate Overall 90 % % % % PlanningTime ( s ) Easy 13 ±

22 9 ±

21 7 ± ± Difﬁcult 433 ±

388 188 ±

264 58 ± ± ABLE IVQ

UANTITAVE P ERFORMANCE OF

SPAMP V

ARIANTS (T ABLETOP ) Metrics Scenario Planning Algorithms N AIVE N AIVE +DD P

HASE SPAMP

SuccessRate Overall 85 % % % % PlanningTime ( s ) Easy 16 ±

22 14 ±

37 13 ± ± Difﬁcult 524 ±

439 379 ±

461 48 ± ± SimulationTime ( s ) Easy 7 ±

13 4 ± ± ± Difﬁcult 130 ±

292 63 ±

136 18 ± ± uses one randomly sampled Phase 1 valid AMP as a subgoalin MHA*; and S UB G+DD uses soft duplicate detection ontop of the S UB G planner. Problems in this experiment areinitialised with 8 movable objects on the tabletop. Table IIIshows the quantitative beneﬁts of subgoals and soft duplicatedetection individually, and that in tandem they can greatlyimprove performance over the N

AIVE planner, which can beconsidered a lower-bound on performance for any planningalgorithm in this domain.In a second experiment, we compare the performance ofSPAMP against three related variants on the same tabletopworkspace experiment from Section VI-A. We compareagainst N

AIVE , N

AIVE +DD, and also a planner (P

HASE N = 3 Phase 1 valid

AMP s withoutsimulation for use as subgoals. All three of these baselinesare allowed to simulate

AMP s from within the δ -sphere ofa goal conﬁguration from the outset. Table IV shows thatwhile simply using Phase 1 valid subgoals has clear beneﬁtsover not using them, the necessary reliance on simulating allPhase 1 valid AMP s leads to higher simulation times, andthereby higher planning times, as compared to SPAMP.All planners in this section utilise Assumption 1 fromSection IV-C. If a planning problem is unsolvable due tothe assumption being violated, that failure is common to allplanners. All other failures for the experiments in this sectionare due to planners exceeding the timeout (1800s) .VII. D

ISCUSSION & F

UTURE W ORK

In this work we present SPAMP, an algorithm forSimulation-based Planning with Adaptive Motion Primitivesfor the

MAMO domain. We use

AMP s as subgoals withina multi-heuristic search framework to solve manipulationplanning problems in cluttered scenes. SPAMP improvesplanning times by up to − × over KPIECE, andup to − × over Selective Simulation, two state-of-the-art baselines for the MAMO domain. SPAMP also reducessimulation times by up to − × in comparison tothese baselines. We show that our assumption of restrictingrobot-object interactions to terminal AMP s in a plan isnot restrictive since we solve − of all problems.In our future work, we hope to relax our assumption thatinteractions may only occur during terminal AMP s in orderto solve all planning problems in the

MAMO domain. R

EFERENCES[1] M. Stilman, J. Schamburek, J. Kuffner, and T. Asfour, “Manipulationplanning among movable obstacles,” in . IEEE, 2007.[2] G. T. Wilfong, “Motion planning in the presence of movable obsta-cles,”

Ann. Math. Artif. Intell. , 1991.[3] M. Stilman and J. J. Kuffner, “Navigation among movable obstacles:Real-time reasoning in complex environments,”

Int. J. HumanoidRobotics , 2005.[4] B. J. Cohen, G. Subramania, S. Chitta, and M. Likhachev, “Planningfor manipulation with adaptive motion primitives,” in

IEEE Interna-tional Conference on Robotics and Automation, ICRA 2011 . IEEE,2011.[5] J. P. van den Berg, M. Stilman, J. Kuffner, M. C. Lin, and D. Manocha,“Path planning among movable obstacles: A probabilistically completeapproach,” in

Eighth International Workshop on the AlgorithmicFoundations of Robotics, WAFR 2008 .[6] M. R. Dogar and S. S. Srinivasa, “A planning framework for non-prehensile manipulation under clutter and uncertainty,”

Auton. Robots ,2012.[7] J. E. King, M. Cognetti, and S. S. Srinivasa, “Rearrangement planningusing object-centric and robot-centric action spaces,” in . IEEE,2016.[8] J. E. King, “Robust rearrangement planning using nonprehensile in-teraction,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh,PA, December 2016.[9] E. Plaku, L. E. Kavraki, and M. Y. Vardi, “Motion planning withdynamics by a synergistic combination of layers of planning,”

IEEETrans. Robotics , 2010.[10] S. Zickler and M. M. Veloso, “Efﬁcient physics-based planning:sampling search via non-deterministic tactics and skills,” in . IFAAMAS, 2009.[11] M. R. Dogar, K. Hsiao, M. T. Ciocarlie, and S. S. Srinivasa, “Physics-based grasp planning through clutter,” in

Robotics: Science andSystems VIII, 2012 , 2012.[12] N. Vahrenkamp, M. Do, T. Asfour, and R. Dillmann, “Integrated graspand motion planning,” in . IEEE, 2010, pp. 2883–2888.[13] I. A. Sucan and L. E. Kavraki, “A sampling-based tree planner forsystems with complex dynamics,”

IEEE Trans. Robotics , 2012.[14] M. Suhail Saleem and M. Likhachev, “Planning with selective physics-based simulation for manipulation among movable objects,” in ,2020.[15] W. Du, S. Kim, O. Salzman, and M. Likhachev, “Escaping localminima in search-based planning using soft duplicate detection,” in . IEEE, 2019.[16] J. D. Hern´andez, M. Moll, and L. E. Kavraki, “Lazy evaluation ofgoal speciﬁcations guided by motion planning,” in . IEEE, 2019, pp.944–950.[17] S. Aine, S. Swaminathan, V. Narayanan, V. Hwang, and M. Likhachev,“Multi-heuristic a,”

The International Journal of Robotics Research ,2016.[18] B. J. Cohen, S. Chitta, and M. Likhachev, “Search-based planningfor manipulation with motion primitives,” in

IEEE InternationalConference on Robotics and Automation, ICRA 2010 . IEEE, 2010.[19] E. Coumans and Y. Bai, “Pybullet, a python module for physics sim-ulation for games, robotics and machine learning,” http://pybullet.org,2016–2019.[20] I. A. S¸ucan, M. Moll, and L. E. Kavraki, “The Open Motion PlanningLibrary,”

IEEE Robotics & Automation Magazine , vol. 19, no. 4, pp.72–82, December 2012, https://ompl.kavrakilab.org.[21] B. C¸ alli, A. Singh, J. Bruce, A. Walsman, K. Konolige, S. S. Srinivasa,P. Abbeel, and A. M. Dollar, “Yale-cmu-berkeley dataset for roboticmanipulation research,”

Int. J. Robotics Res. [22] A. Agarwal, Y. Han, and M. Likhachev, “Perch 2.0 : Fast and accurategpu-based perception via search for object pose estimation,” in