[PDF] Learning to Navigate Cloth using Haptics

Abstract

We present a controller that allows an arm-like manipulator to navigate deformable cloth garments in simulation through the use of haptic information. The main challenge of such a controller is to avoid getting tangled in, tearing or punching through the deforming cloth. Our controller aggregates force information from a number of haptic-sensing spheres all along the manipulator for guidance. Based on haptic forces, each individual sphere updates its target location, and the conflicts that arise between this set of desired positions is resolved by solving an inverse kinematic problem with constraints. Reinforcement learning is used to train the controller for a single haptic-sensing sphere, where a training run is terminated (and thus penalized) when large forces are detected due to contact between the sphere and a simplified model of the cloth. In simulation, we demonstrate successful navigation of a robotic arm through a variety of garments, including an isolated sleeve, a jacket, a shirt, and shorts. Our controller out-performs two baseline controllers: one without haptics and another that was trained based on large forces between the sphere and cloth, but without early termination.

Full PDF

LLearning to Navigate Cloth using Haptics

Alexander Clegg, Wenhao Yu, Zackory Erickson, Jie Tan, C. Karen Liu, Greg Turk

Abstract — We present a controller that allows an arm-like manipulator to navigate deformable cloth garments insimulation through the use of haptic information. The mainchallenge of such a controller is to avoid getting tangledin, tearing or punching through the deforming cloth. Ourcontroller aggregates force information from a number of haptic-sensing spheres all along the manipulator for guidance.Based on haptic forces, each individual sphere updates its targetlocation, and the conﬂicts that arise between this set of desiredpositions is resolved by solving an inverse kinematic problemwith constraints. Reinforcement learning is used to train thecontroller for a single haptic-sensing sphere, where a trainingrun is terminated (and thus penalized) when large forces aredetected due to contact between the sphere and a simpliﬁedmodel of the cloth. In simulation, we demonstrate successfulnavigation of a robotic arm through a variety of garments,including an isolated sleeve, a jacket, a shirt, and shorts. Ourcontroller out-performs two baseline controllers: one withouthaptics and another that was trained based on large forcesbetween the sphere and cloth, but without early termination.

I. INTRODUCTIONWhile research in manipulation of deformable objects hasmade great progress in recent years, autonomous robotstoday still face tremendous challenges when manipulatingdeformable objects in the everyday human world. Mostprevious work has focused on planning the trajectories of thegrippers/end effectors to manipulate the object into a desiredconﬁguration. In contrast, we are interested in a differentproblem domain where the whole manipulator must navigatearound deformable objects to achieve a geometric goal state.This problem is representative of a wide variety of roboticapplications; such as a manipulator retrieving objects fromfoliage, a snake robot navigating through rubble, a surgicalend effector moving through a patient’s esophagus, or ahumanoid putting on a hazmat suit.This work focuses on one of the most challenging ma-nipulation tasks in everyday life—dressing. The goal of thedressing task is to navigate the garment to achieve a desiredrelative positioning of the garment and the limb. This is achallenging task, as the motion of clothing, especially inresponse to contact forces, is highly complex and difﬁcultto predict. To prevent damage to clothing and increase thechance of successful completion of the task, we posit that the

Alexander Clegg, Wenhao Yu, Greg Turk and C. Karen Liu are with theSchool of Interactive Computing, Georgia Institute of Technology, Atlanta,GA., USA.Zackory Erickson is with the Healthcare Robotics Lab, Georgia Instituteof Technology, Atlanta, GA., USA.Jie Tan is with Google Brain, Google, Mountain View, CA., USA.Alexander Clegg is the corresponding author [email protected] .This material is based upon work supported by the National ScienceFoundation Graduate Research Fellowship under Grant No. DGE-1650044and NSF award IIS-1514258. haptic feedback is a vital component of any control systemattempting to navigate and manipulate the state of cloth. Pre-vious work designed specialized dressing controllers withoutthe use of haptic perception [1]. These controllers tend to besensitive to the initial position of the manipulator relative tothe garment and the materials of the garment, even thoughperfect vision and augmented environment information (e.g.geodesic distance encoded on the surface of the clothes) areprovided.The goal of this work is to develop a control policycapable of navigating cloth using haptic sensory input.Unlike previous work, we aim to develop a generalizablefeedback policy that only leverages haptic perception andproprioception to determine the next action from an observedstate. As such, our approach uses reinforcement learning tooptimize a policy for a haptic-sensing sphere as a buildingblock. The policy makes decisions based only on the contactforces exerted on the sphere and the relative position of thesphere center to the ﬁnal target without any prior knowledgeor vision sensing capability. By aggregating multiple haptic-sensing spheres, the learned policy can be applied to robotswith arbitrary morphologies. Based on haptic forces, eachindividual sphere proposes an update to its location. Theconﬂicts that arise between this set of desired positionsare resolved by solving an inverse kinematic problem withconstraints. While the importance of haptics in dressing tasksseems intuitive from our own experiences, it is not clearhow humans exploit haptic perception to aid in dressing. Intraining, we reward the manipulator for proximity to a targetposition and we perform early termination of a training roll-out if excessive force is sensed between the cloth and themanipulator. This gives the learning algorithm the freedomto explore effective strategies to leverage haptics for dressingtasks.Compounding the challenge of incorporating haptics isthe computational cost of cloth simulation in a contact-rich environment, as is the case with dressing. Directlygenerating thousands of rollouts with cloth simulation duringpolicy learning is computationally impractical and proneto overﬁtting a particular type of garment. In contrast, wehypothesize that many of the navigation tasks through de-formable objects share the same fundamental skill regardlessof the environments or the morphology of the robot. Assuch, we propose to train a sphere to move through a funnel-like geometry, which provides haptic feedback in the formof contact forces between the sphere and the funnel. Dueto the simplicity of the spherical geometry, the calculationof contact force can be done analytically without the needfor numerical simulation. Simplifying the task and the en- a r X i v : . [ c s . R O ] J u l ironment drastically accelerates the training process; but,a question then arises- how well will the policy generalizeto complex environments? The primary contribution of thiswork is a control system capable of guiding a manipulatorof arbitrary morphology to complete the dressing task usingonly haptic data and Cartesian task targets. We show that asimple policy trained with analytical contact forces can bedirectly applied to navigating physically simulated cloth. Fur-ther, though trained individually, aggregated haptic-sensingspheres can work collectively to guide the manipulator totask completion.We evaluate our method by testing the policy in variousdressing scenarios including an isolated sleeve, a jacket, ashirt, and shorts. In each test we vary a set of parameters,such as the initial state of the manipulator or the geometricstate of the cloth, and measure the success rate of thedressing task over many trials. We also compare our methodto two different baseline policies. The ﬁrst baseline policyoperates without the use of haptic sensing while the secondone penalizes contact forces quadratically. The results showthat our method has a high success rate for all tasks andoutperforms both baselines by a wide margin.II. RELATED WORKNavigation applications in robotics often rely on one ormore of the following assumptions: a collision-free pathexits, line-of-sight sensing is available, and the environmentis near-static [2]–[6]. When navigating in a cluttered, de-formable environment, such as the dressing tasks in our work,none of the above assumptions hold true.One approach to address these issues is to incorporate hap-tic sensing [7]–[10]. Jain et al. [7] showed that a manipulatorcan reach goal locations in a cluttered environment by usinga model predictive controller (MPC) and tactile sensors overthe entire arm. KillPack et al. [10] further improved the MPCby modeling the full dynamics of the robot arm instead ofusing a quasi-static model. These methods have the ability topredict future contact forces using a bi-linear spring contactmodel. In a highly deformable environment, such as theinterior of an article of clothing, this simple contact model isunlikely to capture the complex contact behavior accurately.However, employing a full cloth simulation will increase thecomputation time signiﬁcantly, rendering the MPC unable tooptimize the control in real time.The importance of haptic sensing in robotic manipula-tion of deformable bodies has also been recognized in theemerging area of robot-assisted dressing [11]–[15]. Whilethe problem of manipulating deformable cloth is relevant, ourwork focuses on developing controllers for self dressing tasks [1], [16]–[18]. Clegg et al. [1] proposed a full-body self-dressing controller that exploits geodesic distance informa-tion encoded on the surface of the garment to guide motionplanning. However, due to the lack of haptic sensing, theirmethod is sensitive to cloth materials and initial conditions.Our work attempts to achieve a more robust and general-izable controller for self-dressing using haptic information.Our ﬁrst attempt to include haptics as a contact force penalty term in the optimization-based inverse kinematics methodproposed by [1] proved to be unsuccessful. Due to thecomplex deformable geometry of cloth, it is difﬁcult to strikea balance between avoiding large contact forces and makingprogress toward the end-effector target. As such, we do notuse manual controller design and instead propose a differentapproach using Reinforcement Learning.Recent advances in Reinforcement Learning have enabledthe training of complex robotic motor skills with high-dimensional continuous state and action spaces [19]–[23]and human-expert-level agents in playing Atari games [24]and Go [25]. We use Trust Region Policy Optimizationwith Generalized Advantages Estimation (TRPO) to trainour policy [22], [23]. TRPO has been used to learn complexmotor skills such as simulated humanoid running and gettingup.Directly applying TRPO to learn navigation skills indeformable environments is impractical due to the highcomputational cost of simulating deformable bodies. In thiswork, our controller is trained to perform navigation in astatic environment with simple obstacle geometry. Similarapproaches that use a simpliﬁed model have been used inprevious work on manipulation of deformable objects [26]–[30]. For example, Miller et al. [26] restricted the cloth stateto be vertically hanging and used a quasi-static cloth modelthat neglects cloth dynamics. Phillips-Grafﬂin et al. [29] useda scalar deformability ﬁeld deﬁned on the deformable objectrepresented as voxels to approximate the penalty for defor-mation. These methods typically assume that the deformationencountered in the testing environment is similar to that inthe training environment. In contrast, our method transfersa policy trained on a single rigid sphere moving through astatic, rigid, funnel-like geometry, to a different scenario inwhich a manipulator with arbitrary shapes navigates througha detailed, physically simulated garment.III. METHODSThe core of our method is the development of a sensory-actuator building block, which we call a “haptic-sensingsphere”, capable of tracking a target in Cartesian spacebased on haptic perception and proprioception. We ﬁrstdescribe how such a building block can be trained in areinforcement learning framework. We then describe howthese building blocks can be used to create manipulators witharbitrary morphologies and transferred to unseen deformableenvironments. During policy execution, each haptic-sensingsphere will individually suggest an action that avoids tearingthe cloth while moving towards its goal. We then applyinverse kinematics (IK) optimization to ﬁnd an optimal jointconﬁguration for the manipulator that best coordinates theseindependent movement suggestions. The following sectionsprovide additional details on each of the previously men-tioned system components. A. Training a haptic-proprioception policy

We observe that most dressing tasks have two goals: 1)achieve a desired relative positioning of the garment and ig. 1. (a) A haptic feedback controller must move a sphere with radiusr, located and x through a stationary rigid funnel to reach the target ¯x .(b) Rendering of an arm model populated with haptic feedback controllerspheres. Sphere radius is for visualization purposes only and does not affectcontrol. the limb and 2) avoid excessive contact forces that couldtear the cloth. To learn the fundamental skill necessary toachieve both goals, we train a haptic-sensing sphere to reacha target location at the center of a stationary rigid funnelusing only the contact force and the relative position from thetarget (Fig. 1). As the haptic-sensing sphere moves, it maycome in contact with the funnel. If the depth of penetrationbetween the sphere and the funnel exceeds a predeﬁnedthreshold, indicating an unsafe amount of contact force hasbeen exerted, the task is deemed unsuccessful. The task issuccessful if the haptic-sensing sphere reaches the targetlocation within a certain amount of time.To learn a control policy for the haptic-sensing sphere, weformulate a Markov Decision Process (MDP) deﬁned by atuple ( S, A, P, R, γ, ρ ) , where S is the state space, A is the action space, P : S × A × S (cid:55)→ R is the transition function , R is the reward function , γ ∈ [0 , is the discount factor ,and ρ is the distribution of the initial states . MDP solvesfor a stochastic policy π : S × A (cid:55)→ R that maximizes the expected return . E s ,a ,... (cid:34) ∞ (cid:88) t =0 γ t R ( s t ) (cid:35) where s ∼ ρ ( s ) , a t ∼ π ( a t | s t ) , s t +1 ∼ P ( s t +1 | s t , a t ) .In our problem, a state, [ ¯x − x , ˜ f ] includes the relativeposition from the current center of the haptic-sensing sphere x to the target ¯x , and a normalized contact force computedanalytically ˜ f = dr n , where r is the radius of the sphere, d is the penetration depth between the sphere and thefunnel and n is the direction of penetration. The action issimply the velocity v that moves the haptic-sensing sphereto its next location. In our case, the transition function P is deterministic. We numerically integrate the action v tocalculate the new position of the sphere x next = x + ∆ t v ,from which the new state can be computed.We use a neural network to represent the policy π .The roll-out terminates immediately if the haptic-sensingsphere successfully reaches the target || x − ¯x || < (cid:15) or thepenetration is too deep d > . r . At each state, the rewardfunction R penalizes the distance to the goal and the failure of the task: R ( s ) = −|| x − ¯x || +  if sphere at target, − if penetration too deep, otherwise.The neural network is trained using Trust Region PolicyOptimization (TRPO) [22] with a curriculum learning strat-egy [31]. We start with a shallow funnel, with which TRPOcan ﬁnd a successful policy from random initialization.We then continue training with a wider funnel, as shownin Fig. 1a, which improves the robustness of the learnedhaptic feedback controller. During training with a funnel,both the initial position of the haptic-sensing sphere and theorientation of the funnel are chosen at random. This helps totrain a controller that can handle a variety of goal directionsand obstacle orientations. B. Controlling a Full Limb

Once trained, multiple haptic-sensing spheres can be ag-gregated to control a manipulator in a dressing scenario.We represent a manipulator as connected capsules (Fig. 1b)and place multiple haptic-sensing spheres along the medialaxis of each capsule. Given the contact forces between themanipulator and the cloth, each haptic-sensing sphere queriesits own policy based on the current state, [ ¯x i − x i , ˜ f i ] , toprovide an independent suggestion on how to move themanipulator during dressing.To compute the contact force, ˜ f i , exerted on the i -th haptic-sensing sphere, we use PhysX cloth simulator [32] to providecollision detection and resolution between the manipulatorand every vertex on the cloth. These per-vertex contact forcesare aggregated to compute the contact force on the i -thhaptic-sensing sphere: ˜ f i = (cid:80) j ∈ Ω i f j F max (1)where f j is the contact force at the vertex v j . Ω i is a set ofvertex indices satisfying: j ∈ Ω i , if i = argmin k d ( x k , v j ) (2)where d is a function that computes the Euclidean distancebetween the center of the k -th sphere and the j -th vertex.This operation results in contact forces from collision be-tween the character geometry and the cloth being binnedinto the nearest haptic sensor. Note that while changes incontact geometry and the placement of haptic-sensing spherecenters will affect this process, the radius of the haptic-sensing spheres will not. F max is the maximum contact force that can be exertedon the cloth without tearing it. This value can be determinedby the user or measured empirically from the cloth simulatorand accounts for variation in garment mass, size and materialproperties. With the normalization of F max in Equation 1,the magnitude of contact force exerted on a haptic-sensingsphere always ranges from 0 to 1.In addition to the normalized contact force, we also needa target ¯x as the input for each individual haptic controller.or this purpose, we classify the haptic-sensing spheres intotwo types: leading controllers and trailing controllers. In ourexamples, the controller that is located at the end effector isthe leading controller and the rest are trailing controllers. Thetarget of the leading controller is task speciﬁc and given bythe user. For example, if the task is to stretch the manipulatorthrough a sleeve, the target position would be at the cuff. Thetarget positions of the trailing controllers are their currentlocation ¯x = x . In other words, the trailing haptic controllerswill try to stay stationary while avoiding exerting too muchforce on the cloth. With the current state [ ¯x i − x i , ˜ f i ] as theinput, the policy of each haptic-sensing sphere will computean action, v i , which is the displacement between the currentand next locations: x nexti = x i + ∆ t v i .Since the action for each sphere is computed withoutrespecting the kinematic constraints of the manipulator, weneed to reconcile the suggested motions from the individualspheres. We use an inverse kinematics (IK) solver to ﬁndan optimal joint conﬁguration q ∗ which best match thecollective output of all the haptic-sensing spheres: q ∗ = argmin q (cid:88) i w i || p ( q , r i ) − ( x i + ∆ t v i ) || (3)where r i is the local coordinate of i -th haptic-sensing sphereon the manipulator, which is transformed to the worldcoordinate by p ( q , r i ) , and x i + ∆ t v i is the new locationsuggested by the i -th haptic-sensing sphere. The weight w i speciﬁes the relative importance of each sphere. In all ourexperiments, we set the weight of the leading controller tobe 40 times larger than those of the trailing controllers. Theoptimization (3) is solved by gradient descent. Once thedesired conﬁguration q ∗ is solved, we kinematically adjustthe manipulator to q ∗ and continue to simulate the cloth.IV. RESULTSThe motions of the manipulators in this work are simulatedby Dart [33], which is a multi-body physics simulatorsupported by Gazebo. The cloth is simulated using animplementation of position-based dynamics via PhysX [32]and the garments are represented as triangle meshes with adefault cloth material. The haptic feedback control policyis represented by a Multilayer Perceptron neural networkwith two hidden layers, each consists of hidden unitswith tanh activation functions. The learning process takes iterations of TRPO updates. During each iteration, steps are simulated. We limit rollout length to stepsfor each sample. In order to train a control policy thatis invariant to the target position direction and the forcedirection, we randomly sample the orientation of the trainingfunnel geometry and uniformly initialize the sphere in a m × m × m box centered at the origin. We train thepolicy for a sphere of radius . m and funnel approximatelytwice that radius.To evaluate the effectiveness of our proposed approach, weexamine four representative dressing scenarios with increas-ing difﬁculty; namely a sphere traveling linearly through acloth tube, dressing a jacket, dressing a pair of shorts and TABLE IP

ERFORMANCE COMPARISON ON THE FOUR DRESSING TASKS WITH AFRICTION VALUE OF dressing a T-shirt. The goal of this evaluation is to showthat given minimal task speciﬁc input, our haptics-informedcontrol architecture enables arbitrary limb morphologies torobustly navigate a variety of garments without exertingexcessive force. In each case, a guiding path is provided thatthe manipulator tracks (shown as purple curves in Figures 2to 5). These paths are interpolating cartesian splines formedby connecting the centroids of user-deﬁned vertex loops onthe garment. For example, one control point may be thecentroid of all vertices forming the end of one sleeve. Asthese vertices move during simulation, the control curve isre-computed and the task target is updated. To assess therobustness of our controller in performing the dressing tasks,we add variations to the initial conditions for each trial andaverage the performance of all sampled rollouts to obtainthe ﬁnal success rate. Each simulation rollout’s control cycleis preceded by 2 seconds of cloth simulation to settle thegarment followed 2 seconds of joint pose interpolation duringwhich the limb is moved into its randomly drawn initialcondition. Below are details about the four dressing tasks,including variations of initial conditions. We encourage thereader to view examples of these evaluations and additionaltasks in our supplemental video. • Cloth tube:

A sphere begins at one end of a clothtube and follows a ﬁxed linear target trajectory to theother end. The orientation of the tube is chosen from auniform spherical distribution and varies relative to thegravity direction, resulting in a range of drape conﬁgu-rations. There is no variation in initial sphere position inthis example. The ﬁxed linear target trajectory throughthe draped regions guarantees that the sphere must pushthe garment out of the way in order to reach the taskgoal (Fig. 2) • Jacket:

An arm with approximately human joints andproportions enters through the front of a jacket, andnavigates into and out of one sleeve. The initial positionand joint angles of the arm are varied roughly 17degrees around a rest pose and it occasionally starts incontact with the garment. The arm must move aroundand push through the hanging jacket body to reach thesleeve. This task is fairly easy, as the arm is relativelyunconstrained with a large translational range at theshoulder and the jacket sleeve acts similarly to a looselyhanging cloth tube (Fig. 3) • Shorts:

A leg with a translation limited hip and approxi-mately human joints and proportions is initialized above ig. 2. Result of cloth tube traversal with our controller.Fig. 3. Result of jacket dressing with our controller.Fig. 4. Result of shorts dressing with our controller.Fig. 5. Result of T-shirt dressing with our controller.Fig. 6. Examples of Baseline 1 failing to complete dressing tasks. a pair of shorts. The toe is guided to pass into one legof the shorts. There is variation in the hip translationand roughly 6 degrees of variation in all initial jointdegrees of freedom of the leg. In this example, the footmust pass through the waistband without catching itwith the heel. The leg must also extend in responseto forces on the lower thigh and shin in order to respectthe in-extensibility of the pinned garment. However, theshorts hang open in the gravity direction, making this a medium difﬁculty example (Fig. 4) • T-shirt:

The same arm as in the jacket task, now withmore limited translation at the shoulder is initializedinside a T-shirt. The arm is guided to pass up throughthe body, into and out of the shirt sleeve nearest thepinned shoulder. The initial joint angles of the arm andtranslation range of motion of the shoulder are variedwithin roughly 17 degrees, as well as a larger variationin the elbow angle. In order to complete this task, therm must ﬁrst directly oppose the end effector motion,pushing the elbow against the back of the shirt, and latermust pivot the upper arm around sleeve contact pointson the forearm in order to push the arm out of the sleeve.The highly constrained nature of the workspace makesthis a challenging task (Fig. 5)We compare our controller with two baselines: a haptic-unaware controller (Baseline 1) which moves linearly towardthe target at a ﬁxed speed and a haptic feedback controllertrained with an additional penalty on the magnitude of forcein the reward function (Baseline 2). We compare two metricsacross the three controllers: success rate (SR), which iscomputed as the number of successful trials divided by thetotal amount of trials, and the time to completion (TC), whichis the average time to complete the dressing task for thesuccessful trials. All trials are given 10 simulator seconds tocomplete the navigation task after which the simulation isterminated and counted as failure. The results are shown inTable I and each table entry is the result of 100 trials.From Table I, we can see that our approach outperformsthe baseline controllers in all four dressing tasks. The T-shirtdressing task, which involves navigating the end-effectorthrough the garment while having the upper arm insidethe garment, poses signiﬁcant difﬁculties for the baselinecontrollers, resulting in zero successful trials. In contrast,the controller trained using our method still achieves highperformance with a success rate of . Examples ofsituations when the baseline controllers fail to perform thedressing tasks are shown in Fig. 6.To examine how our algorithm generalizes to differentcloth materials, we perform the cloth tube traversal taskwith different friction coefﬁcients between the end-effectorand the garment. The success rate of the three differentcontrollers can be seen in Fig.7. In the low friction case,both Baseline 1 and our controller perform well. As frictionincreases, it becomes more difﬁcult to navigate the clothtube, and the success rates become lower. However, thereremains a key difference in the performance of our proposedcontroller and the haptic unaware baseline 1. When baseline1 fails, it is due to garment tearing, whereas our controllerrefuses to continue moving forward and stalls until the timelimit is reached. This is an important feature, as any clothnavigation controller’s ﬁrst priority should be to respect forcelimits.Although baseline 2 only completes the tube navigationtask at the easiest cloth orientations (nearly vertical), itsucceeded at never tearing the cloth. This result is consistentwith the expected training outcome of a reward function thatpenalizes all force and therefore results in a controller that isunwilling to push on the cloth even when the forces are low.While this policy is quite successful during analytical funneltraining, it fails to generalize to cloth navigation. This alsoexplains the slightly poorer performance of this baseline onthe jacket example, where the limb must push through thesleeve opening. This baseline performs better on the shortsexample, where the garment more closely resembles a funneland forces are only needed to avoid tearing the garment. We

Fig. 7. Impact of friction for the cloth tube traversal. For each controller,we uniformly sampled 26 friction coefﬁcients and averaged the success ratefor each coefﬁcient over 25 trials. refer the reader to our supplemental video for examples ofthe discussed results.V. CONCLUSIONWe have presented a haptic controller that allows a ma-nipulator to navigate through a variety of deformable clothgeometries, including shirt, jacket and pants. A key aspect ofour approach is that by assembling multiple haptic-sensingspheres, each part of the manipulator can detect and reactto collisions with the cloth. Due to the modular nature ofour manipulators, we can create various manipulator shapes:a single sphere, an arm, a leg, and an upper torso (headtogether with two arms). For all of these examples, ourcontroller out-performs two baseline controllers in terms offrequency of task completion.Despite the success of our controller, there are severalavenues for future work. First, our controller is only capableof responding to a deformable environment, and is notcapable of high-level planning. A logical extension of ourwork would be to incorporate our controller into a systemthat can make higher level plans, such as making decisionsabout which direction to tuck an elbow or when to backtrackwhen the end effector is stuck. A second limitation ofour approach is that we do not yet take into account thepossibility of self-collisions and kinematic constraints otherthan joint limits and rigid connections. For most applications,it will be necessary to resolve potential collisions betweenthe manipulator and the robot or an animated human body.Finally, we would like to deploy our controller on a realrobot that interacts with cloth. However, there are severalchallenges that face implementation on a physical robot.First, this will require the development of haptic sensorsthat are small enough to be densely distributed along amanipulator. Additionally, these sensors must be sensitiveenough to detect the small forces that occur when interactingwith clothing. ACKNOWLEDGMENTSWe thank Charlie Kemp for his feedback on this work.

EFERENCES[1] A. Clegg, J. Tan, G. Turk, and C. K. Liu, “Animating humandressing,”

ACM Trans. Graph. , vol. 34, no. 4, pp. 116:1–116:9, July2015. [Online]. Available: http://doi.acm.org/10.1145/2766986[2] M. Dogar and S. Srinivasa, “A framework for push-grasping in clutter,”

Robotics: Science and systems VII , vol. 1, 2011.[3] A. Hornung, M. Phillips, E. G. Jones, M. Bennewitz, M. Likhachev,and S. Chitta, “Navigation in three-dimensional cluttered environmentsfor mobile manipulation,” in

Robotics and Automation (ICRA), 2012IEEE International Conference on . IEEE, 2012, pp. 423–429.[4] A. Leeper, K. Hsiao, M. Ciocarlie, I. Sucan, and K. Salisbury, “Armteleoperation in clutter using virtual constraints from real sensor data,”in

RSS workshop on robots in clutter: preparing robots for the realworld , 2013.[5] S. S. Srinivasa, D. Ferguson, C. J. Helfrich, D. Berenson, A. Collet,R. Diankov, G. Gallagher, G. Hollinger, J. Kuffner, and M. V. Weghe,“Herb: a home exploring robotic butler,”

Autonomous Robots , vol. 28,no. 1, pp. 5–20, 2010.[6] D. Katz, A. Venkatraman, M. Kazemi, J. A. Bagnell, and A. Stentz,“Perceiving, learning, and exploiting object affordances for au-tonomous pile manipulation,”

Autonomous Robots , vol. 37, no. 4, pp.369–382, 2014.[7] A. Jain, M. D. Killpack, A. Edsinger, and C. C. Kemp, “Reaching inclutter with whole-arm tactile sensing,”

The International Journal ofRobotics Research , vol. 32, no. 4, pp. 458–482, 2013.[8] T. Bhattacharjee, P. M. Grice, A. Kapusta, M. D. Killpack, D. Park,and C. C. Kemp, “A robotic system for reaching in dense clutterthat integrates model predictive control, learning, haptic mapping, andplanning.” Georgia Institute of Technology, 2014.[9] F. Sygulla, C. Schuetz, and D. Rixen, “Adaptive motion control inuncertain environments using tactile feedback,” in

Advanced IntelligentMechatronics (AIM), 2016 IEEE International Conference on . IEEE,2016, pp. 1277–1284.[10] M. D. Killpack, A. Kapusta, and C. C. Kemp, “Model predictivecontrol for fast reaching in clutter,”

Autonomous Robots , vol. 40, no. 3,pp. 537–560, 2016.[11] K. Yamazaki, R. Oya, K. Nagahama, K. Okada, and M. Inaba, “Bottomdressing by a life-sized humanoid robot provided failure detectionand recovery functions,” in

System Integration (SII), 2014 IEEE/SICEInternational Symposium on . IEEE, 2014, pp. 564–570.[12] A. Kapusta, W. Yu, T. Bhattacharjee, C. K. Liu, G. Turk, and C. C.Kemp, “Data-driven haptic perception for robot-assisted dressing,”in

IEEE International Symposium on Robot and Human InteractiveCommunication (RO-MAN), 2016 .[13] Y. Gao, H. Chang, and Y. Demiris, “Iterative path optimisation forpersonalised dressing assistance using vision and force information.”IEEE, 2016. [Online]. Available: http://hdl.handle.net/10044/1/39009[14] W. Yu, A. Kapusta, J. Tan, C. C. Kemp, G. Turk, and C. K. Liu,“Haptic data simulation for robot-assisted dressing,” in . IEEE,2017.[15] Z. Erickson, A. Clegg, W. Yu, C. K. Liu, G. Turk, and C. C. Kemp,“What does the person feel? learning to infer applied forces duringrobot-assisted dressing,” in . IEEE, 2017.[16] E. S. Ho and T. Komura, “Character motion synthesis by topologycoordinates,” in

Computer Graphics Forum , vol. 28, no. 2. WileyOnline Library, 2009, pp. 299–308.[17] H. Wang, K. A. Sidorov, P. Sandilands, and T. Komura, “Harmonicparameterization by electrostatics,”

ACM Transactions on Graphics(TOG) , vol. 32, no. 5, p. 155, 2013.[18] E. Miguel, A. Feng, Y. Xu, A. Shapiro, R. Tamstorf, D. Bradley,S. C. Schvartzman, B. Thomaszewsky, B. Bickel, W. Matusik, et al. ,“Towards cloth-manipulating characters,” in

Computer Animation andSocial Agents , vol. 3, 2014.[19] S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine, “Q-prop: Sample-efﬁcient policy gradient with an off-policy critic,” arXivpreprint arXiv:1611.02247 , 2016.[20] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,D. Silver, and D. Wierstra, “Continuous control with deep reinforce-ment learning,” arXiv preprint arXiv:1509.02971 , 2015.[21] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley,D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deepreinforcement learning,” in

International Conference on MachineLearning , 2016. [22] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trustregion policy optimization,”

CoRR, abs/1502.05477 , 2015.[23] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estima-tion,” arXiv preprint arXiv:1506.02438 , 2015.[24] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland,G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou,H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis,“Human-level control through deep reinforcement learning,”

Nature ,vol. 518, no. 7540, pp. 529–533, 02 2015. [Online]. Available:http://dx.doi.org/10.1038/nature14236[25] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van denDriessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner,I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, andD. Hassabis, “Mastering the game of go with deep neural networksand tree search,”

Nature , vol. 529, no. 7587, pp. 484–489, 01 2016.[Online]. Available: http://dx.doi.org/10.1038/nature16961[26] S. Miller, J. Van Den Berg, M. Fritz, T. Darrell, K. Goldberg, andP. Abbeel, “A geometric approach to robotic laundry folding,”

Int.J. Rob. Res. , vol. 31, no. 2, pp. 249–267, Feb. 2012. [Online].Available: http://dx.doi.org/10.1177/0278364911430417[27] J. Van Den Berg, S. Miller, K. Goldberg, and P. Abbeel, “Gravity-based robotic cloth folding,” in

Algorithmic Foundations of RoboticsIX . Springer, 2010, pp. 409–424.[28] B. Balaguer and S. Carpin, “Motion planning for cooperative ma-nipulators folding ﬂexible planar objects,” in

Intelligent Robots andSystems (IROS), 2010 IEEE/RSJ International Conference on . IEEE,2010, pp. 3842–3847.[29] C. Phillips-Grafﬂin and D. Berenson, “A representation of deformableobjects for motion planning with no physical simulation,” in

Roboticsand Automation (ICRA), 2014 IEEE International Conference on .IEEE, 2014, pp. 98–105.[30] D. Berenson, “Manipulation of deformable objects without modelingand simulating deformation,” in . IEEE, 2013, pp. 4525–4532.[31] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculumlearning,” in

Proceedings of the 26th Annual International Conferenceon Machine Learning , ser. ICML ’09. New York, NY, USA: ACM,2009, pp. 41–48. [Online]. Available: http://doi.acm.org/10.1145/1553374.1553380[32] M. Macklin, M. M¨uller, N. Chentanez, and T.-Y. Kim, “Uniﬁedparticle physics for real-time applications,”