Learning Inverse Kinodynamics for Accurate High-Speed Off-Road Navigation on Unstructured Terrain
LLearning Inverse Kinodynamics forAccurate High-Speed Off-Road Navigation on Unstructured Terrain
Xuesu Xiao , Joydeep Biswas , and Peter Stone , Abstract — This paper presents a learning-based approach toconsider the effect of unobservable world states in kinodynamicmotion planning in order to enable accurate high-speed off-road navigation on unstructured terrain. Existing kinodynamicmotion planners either operate in structured and homogeneousenvironments and thus do not need to explicitly account forterrain-vehicle interaction, or assume a set of discrete terrainclasses. However, when operating on unstructured terrain, espe-cially at high speeds, even small variations in the environmentwill be magnified and cause inaccurate plan execution. Inthis paper, to capture the complex kinodynamic model andmathematically unknown world state, we learn a kinodynamicplanner in a data-driven manner with onboard inertial obser-vations. Our approach is tested on a physical robot in differentindoor and outdoor environments, enables fast and accurateoff-road navigation, and outperforms environment-independentalternatives, demonstrating 52.4% to 86.9% improvement interms of plan execution success rate while traveling at highspeeds.
I. I
NTRODUCTION
Current mobile robot navigation methods can navigatea robot from one point to another safely and reliably instructured and homogeneous environments [1], [2], such asindoor hallways or outdoor paved surfaces. These consistentenvironments allow the robots to use simple kinodynamicmotion planners independent of the environment, thanks tothe limited environment disturbances and stochasticity.Dating back at least to DARPA’s Grand Challenge [3]and LAGR (Learning Applied to Ground Vehicles) [4]program, researchers have also looked into applying au-tonomous navigation in unstructured outdoor environments.Challenges arise from multiple fronts in those natural spaces,but most off-road navigation work focused on perception,e.g. detecting natural obstacles [4], classifying underlyingterrain types [5]–[7], or building semantic maps [8]–[10].For motion control, most off-road robots simply travel at lowspeeds to minimize uncertainty and to maximize safety [11]–[14].While recent advances in deep learning provide roboti-cists with a different avenue to investigate those perceptionproblems in off-road navigation, researchers also startedto combine perception, planning, and motion control us-ing end-to-end learning in unstructured environments [15],[16]. These systems do not require a heavily-engineerednavigation pipeline, and can react to natural environmentsin a data-driven approach. Although these methods can Xuesu Xiao, Joydeep Biswas, and Peter Stone are with Department ofComputer Science, The University of Texas at Austin, Austin, TX 78712. { xiao, joydeepb, pstone } @cs.utexas.edu Peter Stone is with Sony AI.
Fig. 1: The UT Automata scale 1/10th autonomous vehicledrives an eight-turn (T1–T8 on the red path) outdoor racetrack with unstructured terrain. Close-ups of some terrain(black inserts) are shown, including cement, mud, grass,which are covered by leaves, stalks, and/or twigs at differentdensities. For high-speed terrain-aware navigation, the robothas to interact with these unstructured terrain (red insert).enable successful navigation, they are data-intensive and andgenerally do not lead to better navigation than their classicalcounterparts.In this paper, we focus on the motion control side of off-road navigation on unstructured terrain, and use learning tocapture the effect of complex and unknown environmentalfactors on the robot’s low-level kinodynamic model. We useonboard inertial observation to encode environmental factorsand learn an inverse kinodynamic model to produce fast andaccurate low-level control inputs. Using our method, evenwith extensive disturbances caused by high-speed terrain-vehicle interaction, the robot is still able to perform fastand accurate off-road navigation (Fig. 1), compared to akinodynamic model based on ideal assumptions and a learnedmodel that does not consider environmental factors.II. R
ELATED W ORK
In this section, we review related off-road navigation workin terms of perception and motion control.
A. Off-road Perception
The first challenge arises from off-road navigation is per-ception. In unstructured off-road environments, perceptionis no longer simply in the geometric sense (e.g. free vs.obstacle), but also requires semantic information (e.g. grassvs. mud). A plethora of research in terrain classification a r X i v : . [ c s . R O ] F e b as leveraged vibration-based signals to classify terraintypes [5]–[7]. Vision-based sensors, e.g. LiDAR and camera,combined with current deep learning methods, have alsobeen used to build semantic maps [8]–[10]. These perceptionmethods assign costs to discrete terrain classes for planning,but do not consider robot’s kinodynamic model when movingon these terrain. Our work does not distinguish amongdiscrete terrain classes and uses observations collected duringinteractions with different terrain to enable fast and accuratekinodynamic planning. B. Off-road Motion Control
Although research thrust for off-road navigation has beenprimarily focused on perception, roboticists have also investi-gated off-road navigation from the motion control side. Manywheel slip models [11]–[14] have been developed and usedto design controllers to minimize slip. Most of these modelstreat slip only as a function of the vehicle kinematics. Butto achieve high-speed off-road navigation, slip is inevitableand also highly dependent on the underlying terrain.Researchers have also used machine learning for mo-tion control in off-road navigation. A recent survey [17]pointed out that learning is most efficient when targetingat navigation components, e.g., learning local planners [18]–[21], costmaps [22], [23], or planner parameters [24]–[26].Research on using learning for motion control in off-roadscenarios is scarce. Pan, et al. [27] enabled high-speed nav-igation with end-to-end imitation learning from RGB inputin a closed circular dirt track. The expert demonstrator is amodel predictive controller with access to high-precision sen-sors including GPS-RTK. The end-to-end learning approachmost likely does not generalize well to other terrain andtracks. Aiming at a variety of terrain, Siva, et al. [28] usedimitation learning from human demonstration to navigate fivediscrete terrain types (concrete, grass, mud, pebble, rock). Incontrast, our method only targets at learning a kinodynamicmodel and can navigate any global path. We also do notintentionally separate terrain into discrete types, and treatdifferent terrain characteristics in a continuous manner.III. L
EARNING I NVERSE K INODYNAMIC M ODELS
Most navigation systems either assume kindodynamicmodels to be independent of the environment, or that thereexists a discrete set of environment classes [28], e.g., onemodel for paved terrain and another for grass. In this work,we relax these assumptions by learning a single continuousinverse kinodynamic model that accounts for environmen-tal factors across different terrain types without having toperform discrete terrain classification, or analytic modelling.The learned model takes as input inertial observations thatmake the impact of environmental factor on kinodynamicmotion observable (e.g. how bumpiness from gravel will re-sult in understeer at high speeds). This inverse kinodynamicmodel is learned in a data-driven manner.
A. Problem Formulation
Given vehicle state x , control input u , and world state w ,the state dynamics and observation y are given by ˙ x = f ( x, u, w ) , y = g ( x, w ) , (1)where f ( · , · , · ) is the system’s forward kinodynamic function,while g ( · , · ) is the observation function. Note that in mostcases, w is not directly observable and cannot be easilymodeled. A navigation planner generates a global plan Π :[0 , → X mapping from a unitless progress variable s ∈ [0 , to planned vehicle state x ∈ X , incorporatingboth global ( e.g., traversable map) and local ( e.g., sensedobstacles) information to take the robot from the start state Π(0) to the goal state
Π(1) . A projection operator ρ : X → [0 , maps the robot state x (e.g., from localization) to inferthe progress variable s (i.e., the robot’s progress along theglobal plan so far), such that the closest state in the plan toa robot state x is Π( ρ ( x )) . For simplicity of notation, werepresent the projected state at any time as x Π = Π( ρ ( x )) .We also omit the explicit time-dependence of variables x ( t ) , u ( t ) , and y ( t ) , denoting them simply as x , u , and y . Theobjective of our controller u is thus to minimize the totalnavigation time T while following the plan precisely, asrepresented by the joint cost function J = T + γ (cid:90) T || x ( t ) − x Π ( t ) || dt. (2)Here, γ is a hyperparameter that trades total navigation timefor execution accuracy.We formulate the solution to this optimal control problemas a receding-horizon controller u ∗ over a unitless progresshorizon (along the global plan) ∆ and corresponding time-horizon ∆ t such that the control input drives the robot statefrom x to the receding horizon plan state Π( ρ ( x ) + ∆) overtime-period ∆ t : u ∗ = arg min u ∆ t + γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∆ x Π − (cid:90) ∆ t f ( x, u, w ) dt (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , ∆ x Π = Π( ρ ( x ) + ∆) − x, (3)where ∆ x Π is the state change between the receding horizonplan and current vehicle state. The optimal control u ∗ canbe solved using the receding horizon inverse kinodynamicmodel f − as u ∗ = f − (∆ x, x, w ) , (4)that takes as input the desired relative state change ∆ x , thecurrent robot state x , and world state w . Unfortunately, it ishard to express f − accurately via analytical models, andeven if it could be expressed accurately, computing u ∗ iserror-prone since the world state w is not directly observable. In general, the minus signs in Eqn. 3 is the generalized differenceoperator (cid:9) over Special Euclidean Group
SE( n ) and the correspondingLie Algebra. . Learning Inverse Kinodynamics In this work, in order to enable fast and accurate navigationunder the influence of different terrain interactions from theworld state w , we adopt a data-driven approach to capturethe effect of w . Specifically, we introduce the function f + θ (∆ x, x, y ) ≈ f − (∆ x, x, w ) , (5)parameterized by θ , as an approximation for the originalreceding horizon inverse kinodynamic function (superscript + denotes pseudo inverse). The key insight in this approx-imation is that the impact of w on u ∗ becomes predictablegiven observations y related to high-speed terrain-awarenavigation—in our case we use onboard inertial sensing tocapture speed-dependent terrain interaction. To learn f + θ , atraining dataset T with N samples T = {(cid:104) ∆ x i , x i , u i , y i (cid:105) Ni =1 } is desired, using the optimal but unknown receding horizoninverse kinodynamic function u i = f − (∆ x i , x i , w i ) andobservation function y i = g ( x i , w i ) from Eqns. 1 and 4.Unfortunately, we neither know f − , nor do we know theworld states w i . However, we do have access to f as ablack-box function via real-world execution: we can simplypick arbitrary sample controls u i at corresponding startingstates x i , and observe the resulting state change ∆ x i afterthe chosen receding horizon ∆ t , including the impact of theunknown w i : ∆ x i = (cid:90) ∆ t f ( x i , u i , w i ) dt. Thus, the original chosen control u i is the control for theresulting state change ∆ x i from the original state x i , for thecorresponding but unknown (and hence unrecorded) worldstate w i . Along with the corresponding sensor observation y i , we generate each sample i for dataset T . To ensure thatthe learned parameters θ of f + θ approximate f − accuratelyat states that the robot will encounter during execution, T must include representative samples for x , u , and y . Withthe collected dataset T , we formulate deriving f + θ ( · , · , · ) asa learning problem by minimizing a supervised loss: θ ∗ = arg min θ (cid:88) (∆ x i ,x i ,y i ) ∈T (cid:107) f − ( · , · , · ) − f + θ (∆ x i , x i , y i ) (cid:107) H = arg min θ (cid:88) ( u i , ∆ x i ,x i ,y i ) ∈T (cid:107) u i − f + θ (∆ x i , x i , y i ) (cid:107) H , (6)where || v || H = v T Hv is the norm induced by positivedefinite matrix H , used to weigh the learning loss betweenthe different dimensions of the control input u i . We represent f + θ ( · , · , · ) as a neural network and can therefore use gradientdecent to find an approximately optimal θ ∗ . In this work,we collect raw 6-DoF readings from an onboard IMU, andconstruct y by feeding inertial data through an autoencoder toencode relevant terrain-vehicle interaction at different drivingspeeds. More details are provided in Sec. IV. Here, we assume the optimal control that follows the global path (secondterm in Eqn. 2) also implicitly minimizes navigation time (first term).
C. Online Execution
The learned inverse kinodynamic model f + θ ∗ ( · , · , · ) pro-vides a means to approximately account for w using theonboard observation y for fast, terrain-aware, and precisenavigation. At each time step t during online execution, wecompute the desired change of state ∆ x Π = Π( ρ ( x )+∆) − x with x from localization, the projection operator ρ ( · ) , pro-jection horizon ∆ , and global plan Π( · ) . Along with onboardobservation y and current vehicle state x , we use the learnedinverse kinodynamic model f + θ ∗ to produce system controlinput: u ( t ) = f + θ ∗ (∆ x Π , x, y ) , (7)and repeat this process for every time step.IV. E XPERIMENTS
In this section, we present experimental results using alearned inverse kinodynamic model f + θ ∗ (∆ x, x, y ) , whichconsiders unobservable world state w by taking y as input,and can precisely track different global plans by different Π .We denote the baseline forward and inverse kinodynamicfunctions, which do not consider world state w , as f B ( x, u ) and f + B (∆ x, x ) , respectively. As an ablation study to test theeffectiveness of capturing the world state w with y , we alsolearn an ablated inverse kinodynamic model f + Aφ ∗ (∆ x, x ) ,which is parameterized by φ ∗ and does not take observation y as input to represent w . We show that the learned f + Aφ ∗ can outperform the baseline f + B . Adding the learned obser-vation y as another input, f + θ ∗ (∆ x, x, y ) can outperform both f + Aφ ∗ (∆ x, x ) and f + B (∆ x, x ) .The learned inverse kinodynamic model for online exe-cution is also agnostic to different online plans and unseenterrain. We show that the model learned through a globalplanner Π (a randomly exploring policy driven by a humanoperator) can also generalize well to a complicated globalplanner Π on an outdoor race track, and a simple globalplanner Π on an indoor track. We also test the generalizabil-ity with respect to unseen world states through the encoded y and f + θ ∗ (∆ x, x, y ) on unseen terrain. A. Implementation1) Robot Platform:
Our learned inverse kinodynamicmodel is implemented on a UT Automata robot, ascale 1/10th autonomous vehicle platform (Fig. 2a). TheAckermann-steering four-wheel drive vehicle is equippedwith a 2D Hokuyo UST-10LX LiDAR for localization, a Vec-tornav VN-100 IMU for inertial sensing (200Hz), a FlipskyVESC 4.12 50A motor controller, and a Traxxas Titan 550Motor. Although the platform has individual suspensions,the relatively short-travel suspensions are not specificallydesigned for off-road navigation. We specifically pick thisplatform because its small size and weight and the lack ofdesigned off-road capability can maximize the difference inaccuracy between navigating with and without the learnedterrain-aware inverse kinodynamic model. The robot has aNVIDIA Jetson onboard, but only the CPU is used duringdeployment. a) (b)
Fig. 2: (a): A UT Automata robot, scale 1/10th autonomousvehicle platform used in the experiments. (b): The sampling-based baseline approximates ∆ x Π by rolling out the optimal u B among 100 samples with f B .
2) Environment:
The environment comprises of cement,grass, and mud; and some patches are covered by differentartifacts, such as leaves, stalks, and/or twigs, with differentdensities (Fig. 1). For a small vehicle like the UT Automatarobot, these artifacts can cause significantly different worldstates (Fig. 1 red insert). Note that the terrain also changesdue to environmental factors such as sunlight, wind, andmoisture, and is also affected by the robot’s wheel andchassis. To minimize the effect of artifacts being pushedoff the course during extended experiments, we frequentlyshuffle and redistribute the artifacts. We do not specifydiscrete terrain classes and treat the terrain characteristicsin a continuous manner.
3) Model Implementation:
During autonomous naviga-tion, the robot uses Episodic Non-Markov Localization(ENML) [29] with a pre-built map of the environment toderive vehicle state x . A global planner Π includes a pre-generated global path for the robot to follow and uses line-of-sight control (similar to [30], [31]) to generate desiredreceding horizon plan state Π( ρ ( x ) + ∆) on the global path1m away from the robot. In a model predictive controlmanner, the robot uses the baseline forward kinodynamicfunction f B and samples candidate velocity and curvaturecontrol inputs u ∈ U evenly distributed within a physically-feasible window to jointly find the desired state change ∆ x Π and control input u B (shown in Fig. 2b). More specifically,to compute control input, the baseline inverse kinodynamicmodel f + B produces the curvature input, which results in thedesired ∆ x Π and drives the robot as close to Π( ρ ( x ) + ∆) as possible: u B = f + B ( · , · )= arg min u (cid:107) Π( ρ ( x ) + ∆) − (cid:90) ∆ tt =0 f B ( x, u ) dt (cid:107) , (8)for the second term in Eqn. 2, and selects the fastestpossible velocity for the first term T , considering the robot’sacceleration limit and a safety distance to decelerate in caseof obstacles.For our learned ablated and final model, f + Aφ ∗ and f + θ ∗ ,we utilize the ∆ x Π from the baseline kinodynamic model(corresponds to u B ), but instead of using the baseline’s Fig. 3: Neural Network Architecture. Input: blue IMU en-coder and orange desired state change (desired velocityand curvature in practice); Output: purple learned functionapproximator f + θ ∗ as the inverse kinodynamic model.control input, we query our learned models to produce u = f + Aφ ∗ (∆ x Π , x ) or u = f + θ ∗ (∆ x Π , x, y ) . In practice, weuse the baseline control input u B = { v, c } (linear velocityand steering curvature) to represent the desired state changerate ∆ x Π .
4) Data Collection:
To collect training data, the robotis teleoperated with a joystick in an open environmentwith linear velocity v ∈ [0 , / s and steering curvature c ∈ [ − . , . − for 30 minutes (24418 data points).The teleoperator randomly varies both linear velocity andsteering curvature ( Π ). In our specific implementation, therobot reasons in the robot frame and therefore the state x i in the training trajectory T = {(cid:104) ∆ x i , x i , u i , y i (cid:105) Ni =1 } becomes the origin in the robot frame. The ground truth ∆ x i is represented as real { v ir , c ir } , where v ir is from vehicleodometry and c ir = ω ir /v ir ( ω ir is the sensed angular velocityaround the vertical z axis from the IMU). Currently, we take v ir from wheel odometry only, which can be further improvedby adding visual, point cloud, and/or inertial information infuture work. The commanded control input u i = { v ic , c ic } isrecorded from joystick input. For y , we collect the 6-DoFraw IMU signal, including 3-DoF accelerometer and 3-DoFgyroscope, as a sliding history window.
5) Network Architecture:
As a function approximator for f + θ ∗ , we use a two-layer neural network with 32 neuronseach layer (shown in purple in Fig. 3). The neural networktakes realistic/desired { v ir , c ir } (as a proxy for ∆ x i , Fig.3, orange) and observation y as input, and outputs to-be-commanded control input u i = { v ic , c ic } . For observation y ,we concatenate the last 100 IMU readings (0.5s) into a 600-dimensional vector, and feed it into two 256-neuron layers asan autoencoder (Fig. 3, blue). The final embedding for y is atwo dimensional vector, then concatenated with { v ir , c ir } , andfinally trained in an end-to-end fashion. The entire networkarchitecture is shown in Fig. 3. For the ablated model f + Aφ ∗ , the two-dimensional y embedding is removed (onlythe orange and purple components remain). Training bothmodels takes less than five minutes on a NVIDIA GeForceGTX 1650 laptop GPU. During runtime, the trained modelis used onboard the robot’s Jetson CPU with libtorch . a) 1.6m/s (b) 1.7m/s (c) 1.8m/s (d) 1.9m/s (e) 2.0m/s(f) 2.1m/s (g) 2.2m/s (h) 2.3m/s (i) 2.4m/s (j) 2.5m/s Fig. 4: Results of Outdoor Experiments on Seen Terrain: Localized robot positions of all 300 laps are plotted around thepre-defined global path (black line segments) on the map (grey lines). The size of the circles at each turn denotes the numberof failures at that turn. Red: baseline f + B (∆ x, x ) , blue: ablated model f + Aφ ∗ (∆ x, x ) , green: learned model f + θ ∗ (∆ x, x, y ) .Fig. 5: Failure Rate Per Target SpeedFig. 6: Failure Rate per Turn B. Navigation on Seen Terrain
We first test the inverse kinodynamic model’s performanceon the same terrain where the training data is collected,but with a different global planner Π . After collecting thetraining data with Π , an outdoor race track is constructedusing plastic panels and wooden posts (Fig. 1). Starting fromthe origin (robot location in Fig. 1), eight turns are created(T1–T8, Π ). While Turn 1, 2, and 3 are relatively gentleleft-, right-, and left-hand turns, Turn 4 and Turn 8 areroughly 90 ◦ left-hand turns. Turn 5, 6, and 7 are sharp 180 ◦ left-, right-, and left-hand turns.Ten different target speeds (the maximum speed the robottargets at reaching while maintaining safety tolerance todecelerate and avoid potential collisions) are tested, rangingfrom 1.6m/s to 2.5m/s with 0.1m/s intervals. For the threemodels, the baseline f + B (∆ x, x ) , the ablated f + Aφ ∗ (∆ x, x ) ,and learned f + θ ∗ (∆ x, x, y ) , we repeat ten trials/laps each forstatistical significance. A total 300 laps are executed. Thelocalized robot position from ENML are shown in Fig. 4 inthe subplots corresponding to the target speeds.At lower target speeds, the green trajectories by the learnedmodel f + θ ∗ (∆ x, x, y ) are much closer to the pre-definedglobal path, compared to the baseline f + B (∆ x, x ) , becausethe latter model fails to consider the world state caused bythe unstructured terrain. With increasing speed, the robottrajectory becomes more scattered around the global pathdue to increased stochasticity from vehicle-terrain interac-tion. But overall speaking, the green trajectories are alwayscloser to the global path than other alternatives. The bluetrajectory is generated by the ablated model f + Aφ ∗ (∆ x, x ) .Like f + θ ∗ , it learns from actual terrain interactions, but itdoes not consider the current observation y . So f + Aφ ∗ isroughly an averaged model over the continuous spectrumof terrain. Therefore, f + Aφ ∗ outperforms the baseline f + B , butunderperforms f + θ ∗ because it fails to consider the currentworld state.At each turn, the size of the red, blue, and green circlesrepresents the number of failed turns (collision or gettingstuck) in the ten attempted turns. Turn 6 and 7 cause a lot oftrouble for the baseline even at lower speeds. With increasingspeed, more turns cause failure for other models as well, butin general, the baseline fails more frequently at most turnsthan the ablated and learned models. Fig. 5 and Fig. 6 showthe percentage of failed turns per target speed and per turn,respectively. In Fig. 5, failure rate increases with faster speed, a) Unseen Test (b) 2.4m/s (c) 2.5m/s (d) 2.6m/s (e) 2.7m/s (f) 2.8m/s Fig. 7: Experiment Results in Unseen EnvironmentFig. 8: Failure Rate per Target Speed on Unseen TerrainFig. 9: Failure Rate per Turn on Unseen Terrainwhile within each speed, the learned model achieves thelowest failure rate, while the baseline fails most frequently.In Fig. 6, Turn 6 is the most difficult for all three alternatives,and at most turns, the learned model outperforms the ablationand the baseline. Since Turn 8 is immediately after the terrainchange from grass to cement, the robot sometimes oversteers(to compensate for slip on grass) and does not react quicklyenough to understeer (for higher friction on cement), causingit to get stuck in a few laps with the learned model. Thisproblem can be addressed by adding forward looking camerato predict future wheel-terrain interaction in future work. Theoverall success rates of the three models for all turns areshown in Tab. I.TABLE I: Overall Success Rates of All Speeds and Turns
Baseline Ablation
Learning
Seen 52.4% 82.9% %Unseen 49.5% – % C. Navigation on Unseen Terrain
To test that the learned model generalizes to differentglobal planners Π and also to unseen terrain, we furtherconduct an indoor experiment with a different track andglobal path ( Π ) on an unseen wooden floor (Fig. 7a). Notethat f + θ ∗ has only seen training data from random explorationon the outdoor terrain (Fig. 1). Since the unseen woodenfloor is relatively more consistent and therefore easier tonavigate than the outdoor unstructured terrain, we increasethe navigation target speed to 2.4m/s - 2.8m/s, also with0.1m/s intervals. The baseline and the learned model areapplied with these five different target speeds, ten repetitionseach. Fig. 7 shows the results from the 100 laps on the unseenterrain with a different global path. Similar to the results onseen terrain, the learned model produces more concentratedand also closer robot trajectories to the global path to betracked. As shown in Fig. 8 and 9, the learned model alsooutperforms the baseline in terms of failure rate at all targetspeeds and in most turns (except Turn 2). The overall successrates of the baseline and learned model are shown in Tab. I.V. C ONCLUSIONS
In this paper, we present a data-driven approach to learnan inverse kinodynamic model for accurate high-speed nav-igation on unstructured terrain. To capture the elusive andstochastic world state caused by vehicle-terrain interactionat different high speeds, we use an inertia-based observationembedding as an input to the learned inverse kinodynamicfunction. This approach is tested on a physical robot on seenand unseen terrain with different global plans at differenthigh speeds. The experimental results show that the learnedmodel can significantly outperform an ideal baseline modelwithout consideration of world state. Our ablation study alsoshows our observation embedding is useful to enable fastand accurate off-road navigation on unstructured terrain. Forfuture work, better ground truth linear velocity estimationneeds to be investigated: in addition to wheel odometryalone, other sources of perception, e.g. vision, point cloud,and/or inertia, can be leveraged. Better linear velocity esti-mation can account for significant wheel slippage on more We speculate that the generalization would not be as good were themodel trained indoors (on easy terrain) but applied outdoors. hallenging terrain, e.g. on ice, and enable even faster naviga-tion. Adding vision-based observation also has the potentialto enable the robot to prepare for future interactions, e.g. toreduce the failures at Turn 8. Another interesting directionto investigate in the future is generalization from easier toharder environments.ACKNOWLEDGMENTSThis work has taken place in the Learning Agents Re-search Group (LARG) and Autonomous Mobile RoboticsLaboratory (AMRL) at the Artificial Intelligence Laboratory,The University of Texas at Austin. LARG research is sup-ported in part by NSF (CPS-1739964, IIS-1724157, NRI-1925082), ONR (N00014-18-2243), FLI (RFP2-000), ARO(W911NF-19-2-0333), DARPA, Lockheed Martin, GM, andBosch. AMRL research is supported in part by NSF (IIS-1954778, SHF-2006404), DARPA (HR001120C0031), Ama-zon, JP Morgan, and Northrop Grumman Mission Systems.The views and conclusions contained in this document arethose of the authors alone. Peter Stone serves as the Ex-ecutive Director of Sony AI America and receives financialcompensation for this work. The terms of this arrangementhave been reviewed and approved by the University of Texasat Austin in accordance with its policy on objectivity inresearch. The authors would like to thank Sadegh Rabiee,Tongrui Li, and Kavan Sikand for the help with UT Au-tomata. R
EFERENCES[1] S. Quinlan and O. Khatib, “Elastic bands: Connecting path planningand control,” in [1993] Proceedings IEEE International Conferenceon Robotics and Automation . IEEE, 1993, pp. 802–807.[2] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach tocollision avoidance,”
IEEE Robotics & Automation Magazine , vol. 4,no. 1, pp. 23–33, 1997.[3] G. Seetharaman, A. Lakhotia, and E. P. Blasch, “Unmanned vehiclescome of age: The darpa grand challenge,”
Computer , vol. 39, no. 12,pp. 26–29, 2006.[4] L. D. Jackel, E. Krotkov, M. Perschbacher, J. Pippine, and C. Sullivan,“The darpa lagr program: Goals, challenges, methodology, and phasei results,”
Journal of Field robotics , vol. 23, no. 11-12, pp. 945–973,2006.[5] C. Bai, J. Guo, and H. Zheng, “Three-dimensional vibration-basedterrain classification for mobile robots,”
IEEE Access , vol. 7, pp.63 485–63 492, 2019.[6] W. Shi, Z. Li, W. Lv, Y. Wu, J. Chang, and X. Li, “Laplacian sup-port vector machine for vibration-based robotic terrain classification,”
Electronics , vol. 9, no. 3, p. 513, 2020.[7] M. Mei, J. Chang, Y. Li, Z. Li, X. Li, and W. Lv, “Comparativestudy of different methods in vibration-based terrain classification forwheeled robots with shock absorbers,”
Sensors , vol. 19, no. 5, p. 1137,2019.[8] D. Maturana, P.-W. Chou, M. Uenoyama, and S. Scherer, “Real-timesemantic mapping for autonomous off-road navigation,” in
Field andService Robotics . Springer, 2018, pp. 335–350.[9] P. Wolf, A. Vierling, T. Ropertz, and K. Berns, “Advanced sceneaware navigation for the heavy duty off-road vehicle unimog,” in
IOPConference Series: Materials Science and Engineering , vol. 997, no. 1.IOP Publishing, 2020, p. 012093.[10] S. Sharma, J. E. Ball, B. Tang, D. W. Carruth, M. Doude, and M. A.Islam, “Semantic segmentation with transfer learning for off-roadautonomous driving,”
Sensors , vol. 19, no. 11, p. 2577, 2019.[11] S. Rabiee and J. Biswas, “A friction-based kinematic model for skid-steer wheeled mobile robots,” in . IEEE, 2019, pp. 8563–8569. [12] Y. Tian and N. Sarkar, “Control of a mobile robot subject to wheelslip,”
Journal of Intelligent & Robotic Systems , vol. 74, no. 3, pp.915–929, 2014.[13] F. Rogers-Marcovitz, M. George, N. Seegmiller, and A. Kelly, “Aidingoff-road inertial navigation with high performance models of wheelslip,” in . IEEE, 2012, pp. 215–222.[14] M. Seyr and S. Jakubek, “Proprioceptive navigation, slip estimationand slip control for autonomous wheeled mobile robots,” in . IEEE, 2006,pp. 1–6.[15] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-road obstacle avoidance through end-to-end learning,” in
Advances inneural information processing systems . Citeseer, 2006, pp. 739–746.[16] A. Giusti, J. Guzzi, D. C. Cires¸an, F.-L. He, J. P. Rodr´ıguez, F. Fontana,M. Faessler, C. Forster, J. Schmidhuber, G. Di Caro et al. , “A machinelearning approach to visual perception of forest trails for mobilerobots,”
IEEE Robotics and Automation Letters , vol. 1, no. 2, pp.661–667, 2015.[17] X. Xiao, B. Liu, G. Warnell, and P. Stone, “Motion control for mobilerobot navigation using machine learning: a survey,” arXiv preprintarXiv:2011.13112 , 2020.[18] ——, “Toward agile maneuvers in highly constrained spaces: Learningfrom hallucination,” arXiv preprint arXiv:2007.14479 , 2020.[19] X. Xiao, B. Liu, and P. Stone, “Agile robot navigation throughhallucinated learning and sober deployment,” arXiv preprintarXiv:2010.08098 , 2020.[20] B. Liu, X. Xiao, and P. Stone, “Lifelong navigation,” arXiv preprintarXiv:2007.14486 , 2020.[21] H.-T. L. Chiang, A. Faust, M. Fiser, and A. Francis, “Learningnavigation behaviors end-to-end with autorl,”
IEEE Robotics andAutomation Letters , vol. 4, no. 2, pp. 2007–2014, 2019.[22] M. Wigness, J. G. Rogers, and L. E. Navarro-Serment, “Robotnavigation from human demonstration: Learning control behaviors,”in . IEEE, 2018, pp. 1150–1157.[23] C. Richter and N. Roy, “Safe visual navigation via deep learning andnovelty detection,” 2017.[24] Z. Xu, G. Dhamankar, A. Nair, X. Xiao, G. Warnell, B. Liu, Z. Wang,and P. Stone, “Applr: Adaptive planner parameter learning fromreinforcement,” arXiv preprint arXiv:2011.00397 , 2020.[25] Z. Wang, X. Xiao, B. Liu, G. Warnell, and P. Stone, “Appli: Adap-tive planner parameter learning from interventions,” arXiv preprintarXiv:2011.00400 , 2020.[26] X. Xiao, B. Liu, G. Warnell, J. Fink, and P. Stone, “Appld: Adap-tive planner parameter learning from demonstration,” arXiv preprintarXiv:2004.00116 , 2020.[27] Y. Pan, C.-A. Cheng, K. Saigol, K. Lee, X. Yan, E. A. Theodorou,and B. Boots, “Imitation learning for agile autonomous driving,”
TheInternational Journal of Robotics Research , vol. 39, no. 2-3, pp. 286–302, 2020.[28] S. Siva, M. Wigness, J. Rogers, and H. Zhang, “Robot adaptationto unstructured terrains by joint representation and apprenticeshiplearning,” in
Robotics: science and systems , 2019.[29] J. Biswas and M. M. Veloso, “Episodic non-markov localization,”
Robotics and Autonomous Systems , vol. 87, pp. 162–176, 2017.[30] X. Xiao, J. Dufek, T. Woodbury, and R. Murphy, “Uav assisted usvvisual navigation for marine mass casualty incident response,” in . IEEE, 2017, pp. 6105–6110.[31] X. Xiao, E. Cappo, W. Zhen, J. Dai, K. Sun, C. Gong, M. J. Travers,and H. Choset, “Locomotive reduction for snake robots,” in2015 IEEEInternational Conference on Robotics and Automation (ICRA)