Manipulation Planning Among Movable Obstacles Using Physics-Based Adaptive Motion Primitives
Dhruv Mauria Saxena, Muhammad Suhail Saleem, Maxim Likhachev
MManipulation Planning Among Movable Obstacles Using Physics-BasedAdaptive Motion Primitives
Dhruv Mauria Saxena*, Muhammad Suhail Saleem*, and Maxim Likhachev
Abstract — Robot manipulation in cluttered scenes often re-quires contact-rich interactions with objects. It can be moreeconomical to interact via non-prehensile actions, for example,push through other objects to get to the desired grasp pose,instead of deliberate prehensile rearrangement of the scene.For each object in a scene, depending on its properties, therobot may or may not be allowed to make contact with, tilt, ortopple it. To ensure that these constraints are satisfied duringnon-prehensile interactions, a planner can query a physics-based simulator to evaluate the complex multi-body interactionscaused by robot actions. Unfortunately, it is infeasible to querythe simulator for thousands of actions that need to be evaluatedin a typical planning problem as each simulation is time-consuming. In this work, we show that (i) manipulation tasks(specifically pick-and-place style tasks from a tabletop or arefrigerator) can often be solved by restricting robot-objectinteractions to adaptive motion primitives in a plan, (ii) theseactions can be incorporated as subgoals within a multi-heuristicsearch framework, and (iii) limiting interactions to these actionscan help reduce the time spent querying the simulator duringplanning by up to × in comparison to baseline algorithms.Our algorithm is evaluated in simulation and in the real-worldon a PR2 robot using PyBullet as our physics-based simulator.Supplementary video: https://youtu.be/ABQc7JbeJPM . I. I
NTRODUCTION
Manipulation planning problems in domestic households,industrial manufacturing and warehouses require contact-richinteractions between a robot and the objects in the envi-ronment. As the amount of clutter in a scene increases, thelikelihood of finding a completely collision-free trajectoryfor the manipulator decreases. This does not mean the taskis impossible since we might still be able to complete it bymoving the objects around. In these cases, non-prehensileinteractions with objects can be much faster than deliberatelyrearranging the scene via a sequence of slow pick-and-placestyle prehensile maneuvers. In addition, each object in acluttered scene is associated with constraints that define howa robot can manipulate it. For example, while we might beallowed to interact freely with a box of sugar, we might notbe allowed to tilt or topple a cup of coffee.We want to enable robots to grasp in clutter by using non-prehensile actions to interact with objects while satisfyingany object-centric constraints, e.g. constraints that dictatewhether or not we can make contact with, tilt, or topplean object. This domain of Manipulation Among Movable *Dhruv Mauria Saxena and Muhammad Suhail Saleem contributedequally to this work.The authors are with the Robotics Institute, Carnegie Mellon Univer-sity, Pittsburgh, PA 15213, USA. e-mail: { dsaxena, msaleem2,mlikhach } @andrew.cmu.edu Fig. 1. Manipulation tasks in cluttered tabletop (left) or refrigerator (right)workspaces require planners to account for complex multi-body interactionsbetween the robot and objects (green movable objects and red immovableobstacles). The goal is to pickup a randomly selected immovable obstacle.
Obstacles (
MAMO ) [1] is derived from prior work onNavigation Among Movable Obstacles (
NAMO ) [2], [3].Planning problems for
NAMO aim to find a feasible pathbetween start and goal states for a mobile robot navigatingin an environment with reconfigurable obstacles . We focuson the class of MAMO problems where the goal for the robotmanipulator is a 6D pre-grasp pose of an object in SE (3) without any constraints on the final poses of the movableobjects. The task of planning the grasp itself is not addressedby our algorithm.Motion planning for MAMO is computationally costlybecause of two major challenges. First, we need to accu-rately model the dynamics of the robot-object and object-object interactions in the environment during planning. Thisrequires the use of a high-fidelity physics-based simulatorsince hand-designed analytical models are hard to generalisefor complex object geometries and cluttered scenes. Thesimulator is used to model the environment and predictthe outcome of actions. The computational cost associatedwith running a simulator is high, which makes it infeasibleto query the simulator for every action that needs to beevaluated. The second challenge is associated with the searchspace of the problem. Since the robot may interact withobjects in the scene and reconfigure them, the search spaceneeds to include the configuration space of all these objects.This makes the search space for a planning problem in thisdomain grow exponentially with the number of objects, andmakes it computationally hard to find a solution.In this work, we make an observation that many
MAMO problems can be solved effectively by restricting robot-object In our work ‘objects’ may be movable, but ‘obstacles’ are immovable. a r X i v : . [ c s . R O ] F e b nteractions to adaptive motion primitives and show how thisobservation can be exploited to structure an efficient searchfor a contact-rich motion. Adaptive motion primitives ( AMP s,Section IV-B) are long-range actions generated on-the-flysuch that they terminate in a valid goal state [4]. In ourdomain, they are straight lines in the configuration space ofthe robot, between two states whose end-effector Cartesiancoordinates are within δ of each other in Euclidean norm.Since the goal in our domain is defined in the workspaceof the robot, an AMP is computed by linearly interpolatingbetween a state that satisfies the δ threshold condition andan inverse kinematics (IK) solution of the goal pose.Our central assumption in this work limits robot-objectinteractions to the final action (an AMP ) in a plan. Thisrestriction limits the class of
MAMO problems solvableby our algorithm to ones that require at most one robotaction near the goal to make contact with the objects inthe scene. We test our algorithm on random initialisationsof the cluttered tabletop and refrigerator scenes from Fig. 1.Empirically our results in Section VI show that even withthis restriction, our algorithm solves many
MAMO planningproblems and is up to × faster than competitive baselinesin our experiments.Our main contributions towards robot manipulation plan-ning among movable obstacles include: • a two-stage planning approach where we first parallelysample promising AMP s, and then systematically usethem in our planning algorithm to find a solution. • the use of these AMP s as subgoals within a multi-heuristic search algorithm. • an action evaluation scheme that minimises the timespent querying a simulator during planning.II. R ELATED W ORK
The domain of Manipulation Among Movable Obstacles(
MAMO ) is closely tied to prior work in the field ofNavigation Among Movable Obstacles (
NAMO ) [2], [3].Past works in the
MAMO domain have taken one of twopopular approaches - either solving problems via a sequenceof pick-and-place style maneuvers [1], or limiting solutionsto only planar robot-object interactions [5], [6], [7]. Therearrangement planning problem was extensively studied byKing [8] in their thesis which focused on non-prehensileinteractions.In this work we use a physics-based simulator in-the-loop during planning to account for dynamics of robot-object and object-object interactions in the scene. Plakuet. al. [9] decompose the planning problem into a high-level discrete space, and a low-level sampling-based plannerwith a complex dynamics model (physics-based simulator).Zickler and Veloso [10] attempt to solve physics-basedplanning problems with the help of high-level, long-rangerobot behaviours. Dogar et. al. [11] simulate and cachemultiple robot-object interactions, and use their result duringplanning to find feasible solutions. However, they do not al-low any object-object interactions, which can be unavoidablein cluttered
MAMO scenes. Similar to our work, the idea of planning till the proximity of the goal and using a moreexpensive specialized maneuver from within this proximitycan be seen in [12] in the context of grasp planning.Most relevant to our work in this paper are: a sampling-based planning algorithm KPIECE [13] and a search-basedplanning algorithm Selective Simulation [14]. Each usesdifferent approaches to incorporate a simulator in-the-loopduring planning. KPIECE is a sampling-based planner forapplications with complex dynamics that uses an importancefunction over discretised cells of the robot workspace toguide exploration, but calls the simulator for all action eval-uations. This is very computationally expensive for
MAMO and in our experimental comparisons with KPIECE we showthat intelligently limiting the number of simulator queriescan improve quantitative performance. Selective Simulationiteratively plans with simulations in a reduced search space(accounting for some objects) and executes the plan in asimulator (with all objects). If any object constraints areviolated upon execution, it decides on one of these objects tobe added to the search space for the next planning iteration.However, in the original paper Selective Simulation wasevaluated for simple constraints of contact or toppling offthe table. In addition to these, we consider constraints onhow far obstacles can be tilted and how much velocity canbe imparted to them. Selective Simulation is also proneto repeated simulations of similar actions which is timeconsuming and something we explicitly account for in ourwork with soft duplicate detection. Our experimental analysisalso includes comparison with Selective Simulation.III. P
ROBLEM F ORMULATION
A. Search Space
In this work, we denote the robot manipulator as R , and X R ⊂ R q as the configuration space for a q degrees-of-freedom (dofs) manipulator. The set of objects in the sceneis O = { O , . . . , O n } , and the configuration of any object X O i ∈ SE (3) includes the 3D position and orientation.The robot is equipped with a set of actions A . The searchspace for a planning problem in the MAMO domain is theCartesian product of the robot and all object configurationspaces, denoted as X = X R × X O × · · · × X O n . B. Object Constraints
Each object O i is associated with m i constraint functions k ji : X → { , } , ≤ j ≤ m i which return 1 if constraint k ji is satisfied, 0 otherwise. For example, an immovable obstacle(an object that is not allowed to be interacted with, forexample, a wall) will contain a constraint function whichevaluates to 1 so long as neither the robot nor any otherobject makes contact with it. We denote the evaluation of allconstraints of an object as K i ( x ) = (cid:2) k i ( x ) , . . . , k m i i ( x ) (cid:3) . Ifall constraints are satisfied, K i ( x ) = m i .We say a state is valid if all constraints for all objects aresatisfied at that state . Formally, state x is valid if K i ( x ) = We omit the necessary robot kinematic and dynamic feasibility con-straints from the definition of state validity for brevity. m i ∀ O i ∈ O , i ∈ { , . . . , n } . We denote the space of validstates as X V . C. Problem Statement
A planning problem in the
MAMO domain can be definedas the tuple P = ( X , A , x S , X G , T , c ) where x S ∈ X V isthe start state, X G ⊂ X , X G ∩ X V (cid:54) = ∅ is the set of goalconfigurations, T : X × A → X is a deterministic transitionfunction, and c : X × X → R ≥ is a state transition cost-function. We assume X G is defined using a desired end-effector pose in SE (3) (a grasp location for an object; graspplanning is outside the scope of this problem), which leads toa set of possible manipulator configurations, some of whichmust be valid ( X G ∩ X V (cid:54) = ∅ ) .A path π of length T is a sequence of states { x , . . . , x T } and has cost C ( π ) = (cid:80) T − i =1 c ( x i , x i +1 ) where x i +1 = T ( x i , a i ) . The goal for a MAMO planning problem is tofind a valid path from start to goal, i.e. a path made up of asequence of valid states. Formally, we can write this as anoptimisation problem:find π ∗ = arg min π C ( π ) s.t. x ∈ X V , ∀ x ∈ π (path of valid states) x = x S , x T ∈ X G (start, goal constraints) x i +1 = T ( x i , a i ) , a i ∈ A , ∀ x i , x i +1 ∈ π (transition dynamics) D. Graph Representation
We solve
MAMO planning problems using a search-based planning algorithm in X . Our graph representationcontains two types of actions in A - simple primitives and adaptive motion primitives ( AMP s). Each simple primitivechanges one joint angle of a robot by a small amount . Incomparison, an AMP is computed on the fly and can changeall coordinates in X R . A consequence of our core assumption(Section IV-C) is that for simple primitives a s ∈ A , avalid transition x (cid:48) = T ( x, a s ) implies that the state x andthe successor state x (cid:48) only differ in the robot configuration(in X R ). For AMP s a AMP ∈ A , a valid transition x (cid:48) = T ( x, a AMP ) can lead to differences in object configurations( X O × · · · × X O n ) in addition to a difference in X R .IV. A PPROACH
A. Action Evaluation
Following the ideas outlined in [14], we decompose ouraction evaluation scheme into a relatively fast collisionchecking routine and a much slower physics-based simu-lation. Collision checking involves checking for volumetricoverlaps between the collision models of the robot andobjects. This is computationally a relatively cheap operationthat can be easily implemented with the use of a distancefield. If and when necessary, an action that passes thiscollision checking phase might need to be simulated todetermine whether or not it violates any object constraints. In our implementation, ◦ or ◦ depending on the joint in our imple-mentation. Movable Object Immovable Obstacle
Fig. 2. For a particular goal state configuration x G ∈ X R , we can generate AMP s from several states x i within distance δ from it. This δ -sphere inconfiguration space X R might be occupied by both movable (green) objectsand immovable (red) obstacles. This can lead to invalid actions a , a , validaction a , and Phase 1 valid action a whose Phase 2 validity will bedetermined after simulation. The set of objects O in a scene can be separated into twosubsets - movable objects O M that the robot is allowed tointeract with, and immovable obstacles O I = O \ O M . Weassume that this separation is known a priori. Definition 1 (Phase 1 validity) . We say an action a ∈ A from state x ∈ X V is Phase 1 valid if it does not makecontact with any immovable obstacle O ∈ O I . Definition 2 (Phase 2 validity) . We say an action a ∈ A from state x ∈ X V is Phase 2 valid if it is Phase 1 valid andit does not result in any object constraint violations.The fast collision checking routine is used to determinePhase 1 validity of an action as it can quickly detect overlapswith immovable obstacles. This can also determine Phase 2validity if there is no overlap with any object. Note thatfor a ∈ A , x ∈ X V , x (cid:48) = T ( x, a ) , we call the collisionchecking routine for all intermediate states between x and x (cid:48) , including x (cid:48) but not x . In the case when an action is Phase1 valid, but also makes contact with some movable object(s),Phase 2 validity can only be determined after simulating it.This is because we need to account for the complex multi-body interactions that might result upon executing the action.These interactions might violate object constraints due toa movable object-immovable obstacle contact, or the robotviolating other movable object constraints such as tilting ortoppling. We emphasise that determining Phase 1 validity ofan action is computationally much cheaper (around peraction evaluation for a 7 dof manipulator) than determiningPhase 2 validity which requires simulating it (e.g., around . per action of a 7 dof manipulator in PyBullet). B. Adaptive Motion Primitives
Adaptive motion primitives (
AMP s) are IK-based motionprimitives that are generated on-the-fly as part of our algo-rithm [4] and included in A . For any robot configuration x ∈ X R , if the robot’s 3D Cartesian end-effector pose iswithin δ = 0 . of the goal end-effector pose X G , wegenerate an AMP that tries to connect x to X G . This is doneby obtaining an IK solution x G ∈ X R for X G , and linearlyinterpolating between x and x G . We choose this value of ovable ObjectImmovable ObstacleStartGoal Fig. 3. Consider planning between start and goal configurations shown onthe left. A “greedy” shortest-path search algorithm in the
MAMO domainwould proceed along the dotted path, exploring states in the light blueregion, and spend a lot of time simulating interactions with object O nearthe purple state v . Due to the assumptions we make, our search algorithmSPAMP proceeds along the dashed path via orange states u , and only startssimulating AMP s from states beyond u (cid:48) in the δ -sphere around goal x G . δ based on basic domain knowledge like the size of ourworkspaces and obstacles therein. We did not tune this valueto improve performance.The validity of an AMP (Phase 1 or Phase 2) is dependenton checking all interpolated states between x and x G . Fig. 2shows our action evaluation strategy from Section IV-A for AMP s. Since
AMP s terminate in a goal state, they can onlybe included in valid paths as the final action.
C. Assumptions
We make one assumption for solving
MAMO planningproblems in this work which is closely related to
AMP s andtheir use in our search algorithm. In this subsection we hopeto provide an intuitive justification for this assumption.
Assumption 1.
We only need to simulate
AMP s to find avalid solution for a
MAMO planning problem.
For grasping and reaching in cluttered scenes like thosewe consider in this work, there is a large volume of object-free space between the start configuration and goal region.Interactions with objects are necessary when the robot is ina region with a high degree of clutter, and clutter near thegoal is often the most pertinent for finding a feasible plan. inthe tabletop and refrigerator workspaces in our work. Basedon this observation we delay interacting with objects untilthe robot reaches a configuration near the goal.For a preset value of δ , we restrict robot-object interactionsuntil the end-effector is within δ of the goal pose. Since AMP s are long-range actions contained inside this δ -sphere,interactions that are vital to the success of a plan are oftenthe terminal AMP s in a plan (in comparison to the short-range simple primitives which do not lead to meaningfulinteractions). This leads us to Assumption 1. Consequently,since
AMP s are terminal actions, the valid solution pathswe find only contain a single action which interacts withobstacles. It is important to note that restricting interactionsto a single action does not limit the number of objectsthe robot can interact with. We illustrate the effect ofAssumption 1 in Fig. 3 by comparing our algorithm againsta naive search algorithm.
Movable ObjectImmovable Obstacle
Fig. 4. In case there is no interaction-free path between start state x S andthe δ -sphere around any goal state x G , our assumption of only simulatingterminal AMP s will cause our algorithm to not return a solution. A greatervalue of δ in this case could make this problem solvable by our algorithm. This assumption restricts the space of
MAMO planningproblems solvable by our algorithm to those that requireat most one
AMP to interact with objects near the goalconfiguration. Our success rates from Section VI suggest thatthis assumption is not restrictive for the high-clutter sceneswe consider in this work (Fig. 1). In cases where no suchconfiguration near the goal is achievable by the manipulatoras shown in Fig. 4, our algorithm will fail to find a solution.It might still be possible to find solutions in these cases bydynamically changing δ to find a valid AMP , but we havenot explored this yet.Assumption 1 helps us deal with the two major computa-tional challenges for
MAMO :1) The number of calls to the simulator go down signif-icantly, as we now only simulate terminal
AMP s thatinteract with movable objects as opposed to any actionthat interacts with movable objects.2) Since only terminal
AMP s might be simulated, we canplan from x S to inside the δ -sphere around some goalconfiguration purely in X R . This means that the searchspace for finding a path from x S to x T − reduces from X = X R ×X O ×· · ·×X O n to X R , thereby tackling theissue of a prohibitively large search space in clutteredenvironments. D. Subgoals
We solve
MAMO planning problems using a search-basedplanning algorithm. The performance of these algorithms isdependent on the quality of the heuristic functions used. Asit is often infeasible to create a single heuristic that canperfectly guide the search from start to goal in all scenarios,it is common practice in high-dimensional spaces to usea multi-heuristic framework. Multiple heuristics guide thesearch along multiple promising directions, which can helpovercome local minima associated with any one heuristic.Given the fact that simulations are the computationalbottleneck in our domain, and the assumption that we onlysimulate
AMP s, it is helpful to guide a search-based planningalgorithm to regions of the search space where a valid
AMP likely exists. Our two-stage planning approach discussed inSection V first finds
AMP s that are Phase 1 or Phase 2valid, and generates heuristic functions that guide the searchto the beginning of these actions. For an
AMP from state x T − ∈ X V , the corresponding heuristic function is a simpleEuclidean distance from x T − in X R . This first stage is run ovable ObjectImmovable Obstacle Fig. 5. If an
AMP from any x l ∈ L is found to be invalid, we postponethe simulation of AMP s from any other state x that is within β distancefrom it by inflating the heuristic value of that action. in parallel across multiple simulator instances, one per AMP sampled. The second stage runs a multi-heuristic search tofind a path from start state x S to a goal state in X G .The beginning of the AMP x T − can be thought of as a subgoal for the planner since we guide the search towardsit. While it is not necessary to reach the subgoal, exploringthe search space near it can help find a solution.If an AMP a AMP is Phase 2 valid, and the subgoal x T − isreachable without making contact with any object, we do notneed to simulate actions on the way to x T − . It suffices tofind a collision-free path in X R from start x S to x T − , andappend x T = T ( x T − , a AMP ) for a valid MAMO solution.If we only have access to Phase 1 valid subgoals, we allowour planning algorithm to simulate any Phase 1 valid
AMP sthat are generated during the search. To avoid the rare casewhen a subgoal is Phase 2 valid and unreachable (perhapsdue to kinematic limits or a scenario like Fig. 4), we allowour planner to simulate Phase 1 valid
AMP s after time t has elapsed during planning (Algorithm 1, Line 4) . This isimplemented by maintaining a priority queue of all Phase 1valid AMP s generated by the search, and simulating them inorder after time t . E. Soft Duplicate Detection for Action Evaluation
Assumption 1 states that we only simulate Phase 1 valid
AMP s during planning. However, since the number of such
AMP s in cluttered environments can be very large, we furtheroptimise the calls to the simulator by employing a softduplicate action detection scheme [15].Soft duplicate detection estimates the similarity betweenan action that needs to be evaluated in simulation and anaction that has already been simulated and deemed invalid(one which violated object constraints). If they are verysimilar then it is likely that the new action would also violatesome of the same constraints and be invalid. The idea ofpreferring promising actions based on the validity of similaractions can also be found in [16].We maintain a list of states L from which an AMP hasbeen simulated and found to be invalid. For any new state x ∈ X V from which we find a Phase 1 valid AMP , we first We set t = 30s for our experiments. Algorithm 1
Simulation-based Planning with
AMP s (SPAMP)
Input:
Planning problem P , number of AMP subgoals N , numberof AMP samples M , simulation start time t sim , planning timeout t max Output: solution path π procedure SPAMP( P , N, M, t sim , t max ) H ← GetValidSubgoals ( N, M ) (cid:46) Phase 1 or Phase 2 valid subgoals. t ← t sim (cid:46) Simulations are allowed starting from time t . if | H | = 0 or | IsPhase2Valid ( H ) | = 0 then t ← (cid:46) |·| is the set cardinality operation. OP EN ← InitialiseHeuristics ( H ) π ← P LAN ( P , H, OP EN, t, t max ) return π compute its Euclidean distances in X R to states in L . Givena preset threshold β , if (cid:107) x − x l (cid:107) < β for any x l ∈ L wepostpone the simulation of the AMP from x by artificiallyinflating the heuristic value of x and re-inserting it into thepriority queue it was expanded from. This ensures that x might be re-expanded from the same priority queue at a latertime, at which point we would simulate the AMP from it to agoal configuration. Fig. 5 shows how this process implicitlycreates β -spheres in configuration space around states thatlead to invalid AMP s due to any constraint violation.V. A
LGORITHM
Algorithm 1 is a high-level overview of our two-stageplanning pipeline. We call our algorithm Simulation-basedPlanning with
AMP s (SPAMP). The G ET V ALID S UBGOALS subroutine in Line 2 selects subgoals via rejection sampling. M Phase 1 valid
AMP s are randomly sampled and simu-lated in parallel. N Phase 2 valid subgoals are returned (ifavailable), else a combination of Phase 1 and Phase 2 validmake up the N returned subgoals (first stage). This sectiongoes into more details about our specific planning algorithmfrom Line 8 of Algorithm 1 (second stage). A. Multi-Heuristic Framework for
MAMO
We use Multi-Heuristic A* (MHA*) [17] as our searchalgorithm in this work. MHA* maintains multiple priorityqueues, one for each heuristic that is used. It was originallydeveloped under the assumption that all action evaluationstake roughly the same amount of time. This meant priorityqueues could be selected round robin for state expansionsto equitably distribute computational resources across thequeues. However, our action evaluations have varying timecomplexity (checking for Phase 1 vs. Phase 2 validity).Since some queues might need many simulator calls, andsome queues might never query the simulator, a simpleround robin strategy would lead to an uneven distribution ofcomputational resources across the queues. For this reason,we prioritise state expansions from queues that the search hasspent the least time expanding states from thus far. This time-based prioritisation of queues in MHA* leads to a much moreequitable allocation of computational resources for
MAMO . . Planning Algorithm Algorithm 2 contains details from the preceding sectionsto provide a more in-depth look at our planning algorithm.Details of the MHA* planning algorithm can be found inthe original publication [17]. In our algorithm,
OP EN isa set of priority queues, each of which is defined by aheuristic function. Following standard MHA* terminology,our anchor heuristic is a 3D Breadth-First Search (BFS)heuristic computed from the specified Cartesian goal end-effector position [18], taking into account the immovableobstacles in the scene. Every other heuristic is defined by acorresponding
AMP subgoal as per Section IV-D. The actionset A is made up of the simple primitives and AMP s via thefunction
GenerateAMP . Simple primitives and
AMP s aredenoted by a s and a AMP respectively (Section III-D).SPAMP terminates if the next-best state to expand x isin the goal set or we have already found a Phase 2 valid AMP from it. For every other x , the E XPAND functiongenerates and evaluates all possible successor states of x .These necessarily include successors x (cid:48) = T ( x, a s ) ∀ a s ∈ A (Line 13). In addition, we may generate and evaluate an AMP from x to x G ∈ X R , an inverse-kinematics solution for X G (Line 22). This evaluation (Line 23 onwards) occurs if x passes the end-effector distance check (Line 17) and softduplicate check (Line 18). The IsInteraction functionreturns true if a AMP ‘collides’ with a movable object duringthe Phase 1 validity check (Line 24).VI. E
XPERIMENTAL R ESULTS
We run all our experiments on the PR2 robot and usePyBullet [19] as our physics-based simulator. We run ex-periments in two different workspaces - a tabletop and arefrigerator. The objects in a scene are divided into im-movable and movable subsets prior to planning. The robotis allowed to interact with the movable objects but cannottilt them excessively, cause them to fall over or outside theworkspace (table or refrigerator), or impart high velocities.Neither the robot nor any movable object can make contactwith immovable obstacles.
A. Comparative Quantitative Evaluation in Simulation
We compare the performance of our algorithm Simulation-based Planning with
AMP s (SPAMP) against relevant state-of-the-art baseline algorithms that can be applied to the
MAMO domain - KPIECE [13] and Selective Simulation(SS) [14], which were described in Section II. Three variantsof KPIECE were tested. The first two use use the KPIECEimplementation from OMPL [20], but differ in how goalbiasing is implemented. K1 precomputes a set of 3 validgoal configurations by running IK before planning. K2 runsIK online (with random seeds) every time the search treeis grown towards the goal. K3 is our implementation ofKPIECE . The projective space for all KPIECE algorithms Per [13], we implement multiple levels of discretisation, goal biasing,and add all intermediate states of ‘motions’ to the search tree (somethingOMPL does not do). Additionally, we only simulate Phase 1 valid motions.
Algorithm 2
SPAMP procedure P LAN ( P , H, OP EN, t, t max ) ) Insert ( OP EN, x S ) (cid:46) Add to all queues. while OP EN is not empty do h ← BestQueue ( OP EN ) (cid:46) Time-based selection. x ← BestState ( h ) (cid:46) Pop from all queues. if x ∈ X G then return ExtractPath ( x ) if ∃ Phase 2 valid
AMP from x ∈ H then a AMP ← Phase 2 valid
AMP from x ∈ H return ExtractPath ( x ) ∪ {T ( x, a AMP ) } E XPAND ( x, OP EN, t ) procedure E XPAND ( x, OP EN, t ) for a s ∈ A do (cid:46) Simple primitives only. x (cid:48) ← T ( x, a ) . if x (cid:48) ∈ X V then Insert ( OP EN, x (cid:48) ) if (cid:107) FK ( x ) − X G (cid:107) ≤ δ then (cid:46)
3D end-effector poses. if x has soft duplicate then InflateHeuristic ( x ) Insert ( OP EN, x ) return x G ← IK ( X G ) (cid:46) Inverse kinematics. a AMP ← GenerateAMP ( x, x G ) if IsPhase1Valid ( a AMP ) then if IsInteraction ( a AMP ) and t elapsed > t then x (cid:48) ← Simulate ( a AMP ) if x (cid:48) ∈ X V then Insert ( OP EN, x (cid:48) ) else Insert ( L, x ) else if not Interaction ( a AMP ) then x (cid:48) ← T ( x, a AMP ) . Insert ( OP EN, x (cid:48) ) is the 3D Cartesian coordinate for the end-effector (via robotforward kinematics). In addition, we implemented a modifiedversion of Selective Simulation (SS2) which includes softduplicate detection on top of SS.A planning problem is initialised with objects selectedat random from the YCB Object Dataset [21]. Objects areplaced in the workspace at random. The goal for the PR2is to reach a pre-grasp pose for an immovable obstacle.Object masses are obtained from the YCB dataset, and theirPyBullet friction coefficients are randomly sampled from theinterval [0 . , . as the dataset does not provide any frictioncoefficient values. We run all planners on 180 randomlyinitialised planning problems in both workspaces.Table I shows quantitative results for all planners. We use12 objects on the tabletop (6 movable and 6 immovable), and5 objects in the refrigerator (3 movable and 2 immovable) .Sample initialisations of these workspaces are shown inFig. 1. SPAMP uses N = 3 subgoals from M = 8 samples in G ET V ALID S UBGOALS . All planners were givena maximum planning time of . We divide all planningproblems into two scenarios based on how long it takes aN
AIVE planner (a vanilla MHA* algorithm with only the 3DBFS heuristic to the goal) to solve them. N
AIVE simulates The tabletop is . × . , and refrigerator is . × . × . .ABLE IS IMULATION STUDY OF VARIOUS PLANNERS IN THE
MAMO
DOMAIN . Planning Algorithms
Tabletop Workspace (6 movable, 6 immovable) Refrigerator Workspace (3 movable, 2 immovable)
Metrics Scenario SPAMP
K1 K2 K3 SS SS2
SPAMP
K1 K2 K3 SS SS2
Success % Overall % % % % % % % % % % % % PlanningTime ( s ) Easy ± ±
301 339 ±
351 211 ±
369 6 ±
20 8 ± ± ±
293 210 ±
443 167 ±
222 7 ±
22 14 ± ± ±
310 464 ±
416 210 ±
327 22 ±
41 38 ± ± ±
119 128 ±
335 249 ±
425 57 ±
126 63 ± s ) Easy ± ±
301 339 ±
351 42 ±
88 4 ±
18 4 ± ± ±
293 208 ±
441 50 ±
89 3 ±
18 8 ± ± ±
310 463 ±
415 90 ±
131 4 ±
14 16 ± ± ±
118 137 ±
334 44 ±
92 20 ±
82 24 ± all Phase 1 valid AMP s. Problems solved by N
AIVE in lessthan are ‘Easy’, and the rest are ‘Difficult’.SPAMP achieves the highest success rate across all algo-rithms which shows that our assumption from Section IV-Cis not too restrictive for the
MAMO domain. In terms ofplanning times, SPAMP is − × faster than KPIECEand KPIECE2, and − × faster than S EL S IM and S EL -S IM
2. S EL S IM is most competitive in terms of planningtimes, but it is still − × slower than SPAMP for difficultplanning problems. By design, SPAMP spends most of thesimulation time finding valid subgoals for the search whichis still comparable to S EL S IM , and one or two orders ofmagnitude less than the other three baseline algorithms.KPIECE by default simulates all actions in the searchtree, spends almost all of its planning time in simulation,and in OMPL samples a random state in X R of thetime ( X R ⊂ R for a PR2 arm). Since each simulation takesaround . , and mostly random point-to-point exploration of X R (as implemented in OMPL) is wasteful, planning timesgrow quickly with the number of actions KPIECE evaluates. B. Runs on a Physical Robot
We setup the tabletop workspace experiment with thePR2 robot in our laboratory. We also set up a rudimentaryexperiment to calculate the coefficient of static friction asthe tangent of the incline angle of the table at which theobjects start sliding. We used this friction coefficient in oursimulator in an attempt to minimise the sim-to-real gap. Weselected 6 objects at random to initialise our scene and used asearch-based object localisation algorithm [22] on an NVidiaTITAN X GPU to detect the object poses for simulatorinitialisation. SPAMP was given a planning timeout, andobjects were instantiated in the simulator as movable with probability.The quantitative data from execution of 39 plans on thephysical robot is shown in Table II. A success rate of means that in 7 out of 39 of the executions, the robot violatedan obstacle constraint in the real-world. Since the plan foundby the robot must have been valid in simulation, constraintviolations in the real-world are due to a mismatch betweenthe simulator and the real-world. This could occur due tomodeling errors for the obstacles, execution errors on therobot, or perception errors in object localisation.A qualitative assessment of the 39 executions indicatesthat the two main sources of error in our experiment were
Fig. 6. Experimental setup for a PR2 robot in front of a tabletop workspacefor
MAMO . TABLE IIQ
UANTITATIVE P ERFORMANCE FOR R EAL -W ORLD E XPERIMENTS
MetricsAlgorithm
Success Rate Planning Time ( s ) Simulation Time ( s )SPAMP 82 % ± ± inaccurate friction coefficients and object localisations. Fig. 6shows an image of our experimental setup. Our supplementalvideo submission also includes successful executions by thePR2 in a refrigerator and cabinet workspace. C. In-Depth Analysis of
SPAMP in Simulation
To get a better understanding of the quantitative perfor-mance of SPAMP, we conducted experiments to highlightthe effect of various components. For our first experiment, wehighlight the effect of subgoals and soft duplicate detection.We consider four different planning algorithms - N
AIVE is avanilla MHA* algorithm with only the 3D BFS heuristic tothe goal; N
AIVE +DD uses soft duplicate detection on top ofthe N
AIVE planner, everything else being the same; S UB G TABLE IIIE
FFECT OF S UBGOALS AND S OFT DUPLICATE DETECTION
Metrics Scenario Planning Algorithms N AIVE N AIVE +DD S UB G S UB G+DDSuccessRate Overall 90 % % % % PlanningTime ( s ) Easy 13 ±
22 9 ±
21 7 ± ± Difficult 433 ±
388 188 ±
264 58 ± ± ABLE IVQ
UANTITAVE P ERFORMANCE OF
SPAMP V
ARIANTS (T ABLETOP ) Metrics Scenario Planning Algorithms N AIVE N AIVE +DD P
HASE SPAMP
SuccessRate Overall 85 % % % % PlanningTime ( s ) Easy 16 ±
22 14 ±
37 13 ± ± Difficult 524 ±
439 379 ±
461 48 ± ± SimulationTime ( s ) Easy 7 ±
13 4 ± ± ± Difficult 130 ±
292 63 ±
136 18 ± ± uses one randomly sampled Phase 1 valid AMP as a subgoalin MHA*; and S UB G+DD uses soft duplicate detection ontop of the S UB G planner. Problems in this experiment areinitialised with 8 movable objects on the tabletop. Table IIIshows the quantitative benefits of subgoals and soft duplicatedetection individually, and that in tandem they can greatlyimprove performance over the N
AIVE planner, which can beconsidered a lower-bound on performance for any planningalgorithm in this domain.In a second experiment, we compare the performance ofSPAMP against three related variants on the same tabletopworkspace experiment from Section VI-A. We compareagainst N
AIVE , N
AIVE +DD, and also a planner (P
HASE N = 3 Phase 1 valid
AMP s withoutsimulation for use as subgoals. All three of these baselinesare allowed to simulate
AMP s from within the δ -sphere ofa goal configuration from the outset. Table IV shows thatwhile simply using Phase 1 valid subgoals has clear benefitsover not using them, the necessary reliance on simulating allPhase 1 valid AMP s leads to higher simulation times, andthereby higher planning times, as compared to SPAMP.All planners in this section utilise Assumption 1 fromSection IV-C. If a planning problem is unsolvable due tothe assumption being violated, that failure is common to allplanners. All other failures for the experiments in this sectionare due to planners exceeding the timeout (1800s) .VII. D
ISCUSSION & F
UTURE W ORK
In this work we present SPAMP, an algorithm forSimulation-based Planning with Adaptive Motion Primitivesfor the
MAMO domain. We use
AMP s as subgoals withina multi-heuristic search framework to solve manipulationplanning problems in cluttered scenes. SPAMP improvesplanning times by up to − × over KPIECE, andup to − × over Selective Simulation, two state-of-the-art baselines for the MAMO domain. SPAMP also reducessimulation times by up to − × in comparison tothese baselines. We show that our assumption of restrictingrobot-object interactions to terminal AMP s in a plan isnot restrictive since we solve − of all problems.In our future work, we hope to relax our assumption thatinteractions may only occur during terminal AMP s in orderto solve all planning problems in the
MAMO domain. R
EFERENCES[1] M. Stilman, J. Schamburek, J. Kuffner, and T. Asfour, “Manipulationplanning among movable obstacles,” in . IEEE, 2007.[2] G. T. Wilfong, “Motion planning in the presence of movable obsta-cles,”
Ann. Math. Artif. Intell. , 1991.[3] M. Stilman and J. J. Kuffner, “Navigation among movable obstacles:Real-time reasoning in complex environments,”
Int. J. HumanoidRobotics , 2005.[4] B. J. Cohen, G. Subramania, S. Chitta, and M. Likhachev, “Planningfor manipulation with adaptive motion primitives,” in
IEEE Interna-tional Conference on Robotics and Automation, ICRA 2011 . IEEE,2011.[5] J. P. van den Berg, M. Stilman, J. Kuffner, M. C. Lin, and D. Manocha,“Path planning among movable obstacles: A probabilistically completeapproach,” in
Eighth International Workshop on the AlgorithmicFoundations of Robotics, WAFR 2008 .[6] M. R. Dogar and S. S. Srinivasa, “A planning framework for non-prehensile manipulation under clutter and uncertainty,”
Auton. Robots ,2012.[7] J. E. King, M. Cognetti, and S. S. Srinivasa, “Rearrangement planningusing object-centric and robot-centric action spaces,” in . IEEE,2016.[8] J. E. King, “Robust rearrangement planning using nonprehensile in-teraction,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh,PA, December 2016.[9] E. Plaku, L. E. Kavraki, and M. Y. Vardi, “Motion planning withdynamics by a synergistic combination of layers of planning,”
IEEETrans. Robotics , 2010.[10] S. Zickler and M. M. Veloso, “Efficient physics-based planning:sampling search via non-deterministic tactics and skills,” in . IFAAMAS, 2009.[11] M. R. Dogar, K. Hsiao, M. T. Ciocarlie, and S. S. Srinivasa, “Physics-based grasp planning through clutter,” in
Robotics: Science andSystems VIII, 2012 , 2012.[12] N. Vahrenkamp, M. Do, T. Asfour, and R. Dillmann, “Integrated graspand motion planning,” in . IEEE, 2010, pp. 2883–2888.[13] I. A. Sucan and L. E. Kavraki, “A sampling-based tree planner forsystems with complex dynamics,”
IEEE Trans. Robotics , 2012.[14] M. Suhail Saleem and M. Likhachev, “Planning with selective physics-based simulation for manipulation among movable objects,” in ,2020.[15] W. Du, S. Kim, O. Salzman, and M. Likhachev, “Escaping localminima in search-based planning using soft duplicate detection,” in . IEEE, 2019.[16] J. D. Hern´andez, M. Moll, and L. E. Kavraki, “Lazy evaluation ofgoal specifications guided by motion planning,” in . IEEE, 2019, pp.944–950.[17] S. Aine, S. Swaminathan, V. Narayanan, V. Hwang, and M. Likhachev,“Multi-heuristic a,”
The International Journal of Robotics Research ,2016.[18] B. J. Cohen, S. Chitta, and M. Likhachev, “Search-based planningfor manipulation with motion primitives,” in
IEEE InternationalConference on Robotics and Automation, ICRA 2010 . IEEE, 2010.[19] E. Coumans and Y. Bai, “Pybullet, a python module for physics sim-ulation for games, robotics and machine learning,” http://pybullet.org,2016–2019.[20] I. A. S¸ucan, M. Moll, and L. E. Kavraki, “The Open Motion PlanningLibrary,”
IEEE Robotics & Automation Magazine , vol. 19, no. 4, pp.72–82, December 2012, https://ompl.kavrakilab.org.[21] B. C¸ alli, A. Singh, J. Bruce, A. Walsman, K. Konolige, S. S. Srinivasa,P. Abbeel, and A. M. Dollar, “Yale-cmu-berkeley dataset for roboticmanipulation research,”
Int. J. Robotics Res. [22] A. Agarwal, Y. Han, and M. Likhachev, “Perch 2.0 : Fast and accurategpu-based perception via search for object pose estimation,” in