[PDF] Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots

Abstract

Building domain-specific architectures for autonomous aerial robots is challenging due to a lack of systematic methodology for designing onboard compute. We introduce a novel performance model called the F-1 roofline to help architects understand how to build a balanced computing system for autonomous aerial robots considering both its cyber (sensor rate, compute performance) and physical components (body-dynamics) that affect the performance of the machine. We use F-1 to characterize commonly used learning-based autonomy algorithms with onboard platforms to demonstrate the need for cyber-physical co-design. To navigate the cyber-physical design space automatically, we subsequently introduce AutoPilot. This push-button framework automates the co-design of cyber-physical components for aerial robots from a high-level specification guided by the F-1 model. AutoPilot uses Bayesian optimization to automatically co-design the autonomy algorithm and hardware accelerator while considering various cyber-physical parameters to generate an optimal design under different task level complexities for different robots and sensor framerates. As a result, designs generated by AutoPilot, on average, lower mission time up to 2x over baseline approaches, conserving battery energy.

Full PDF

MMachine Learning-Based Automated DesignSpace Exploration for Autonomous Aerial Robots

Srivatsan Krishnan † , Zishen Wan † , Kshitij Bharadwaj † , Paul Whatmough ∓ , Aleksandra Faust § , Sabrina M. Neuman † ,Gu-Yeon Wei † , David Brooks † , and Vijay Janapa Reddi †† Harvard University ∓ ARM Research § Google Brain Research

Abstract —Building domain-speciﬁc architectures for au-tonomous aerial robots is challenging due to a lack of systematicmethodology for designing onboard compute. We introduce anovel performance model called the

F-1 rooﬂine to help architectsunderstand how to build a balanced computing system forautonomous aerial robots considering both its cyber (sensor rate,compute performance) and physical components (body-dynamics)that affect the performance of the machine. We use F-1 to charac-terize commonly used learning-based autonomy algorithms withonboard platforms to demonstrate the need for cyber-physical co-design . To navigate the cyber-physical design space automatically,we subsequently introduce

AutoPilot . This push-button frame-work automates the co-design of cyber-physical components foraerial robots from a high-level speciﬁcation guided by the F-1 model. AutoPilot uses Bayesian optimization to automaticallyco-design the autonomy algorithm and hardware acceleratorwhile considering various cyber-physical parameters to generatean optimal design under different task level complexities fordifferent robots and sensor framerates. As a result, designsgenerated by AutoPilot, on average, lower mission time up to2 × over baseline approaches, conserving battery energy. I. Introduction

Autonomous robots, such as self-driving cars and aerialrobots, are on the rise [1]–[6]. Building computing systemsfor these domains is challenging because autonomous robotsdiffer from traditional computing systems (embedded systems,servers, etc.) in that the robots must sense the environmentthrough its sensors, make real-time decisions (e.g., detectionand evasion) with the available onboard computing and actuateitself within the environment (e.g., evade an obstacle). Theserobots have cyber components (sensor/compute) and physicalcomponents (such as frames/rotors) that interact with one-another to work as one coherent system. Hence, autonomousrobots are cyber-physical systems (CPS) and the traditionalcomputing platform is just one component among many others.To design the optimal onboard compute we need to do cyber-physical co-design . The selection of the cyber and phys-ical components affects system “performance” (i.e., velocity,mission time, energy) of the aerial robot. For instance, cyberquantities, such as sensor framerate and process rate of thesensor data, determine how fast the aerial robot reacts ina dynamic environment, which in turn, determines the safevelocity. Physical quantities, such as weight (frame, payload),determine if the physics allows it to accelerate and move faster. To perform cyber-physical co-design, we must ﬁrst under-stand the role of computing (speciﬁcally in autonomous aerialrobots), and then we design domain-speciﬁc architectures. Tointuitively understand the role of computing in such a cyber-physical system, we introduce the “Formula-1” (F-1) visualperformance model to guide the design of optimal systemsfor a given robot task. F-1 determines which of the cyber-physical components (compute, sensor, body) determines thesafe operating velocity; safe high-speed autonomous navigationremains one of the key challenges in enabling aerial robotapplications [7]–[10]. Safety ensures that the control algorithmis reactive to a dynamic environment, while high-speednavigation ensures that the aerial robot ﬁnishes tasks quickly,thereby lowering mission time and energy [11].Using F-1, we show that performant aerial-robot requirescareful co-design of the autonomy algorithm, as well as theunderlying hardware along with the cyber-physical parametersof the aerial robot. We evaluate two popular learning-basedautonomy algorithms, DroNet [12] and Vgg16 (CAD2RL) [13],on computing platforms used in aerial robots, namely, NvidiaXavier, Nvidia TX2, Intel NCS, and Ras-Pi. Our observationsshow that the ad-hoc selection of autonomy algorithms oronboard computing platforms is far from optimal.To efﬁciently design domain-speciﬁc architectures whilebeing cognizant of the cyber-physical parameters, we introduceAutoPilot—an intelligent cyber-physical design space explo-ration framework that uses the F-1 model to automaticallygenerate the optimal autonomy algorithm (learning-based) andits associated hardware accelerator from a high-level user-deﬁned robot task, platform constraints and optimization targetspeciﬁcations. AutoPilot consists of two parts: (1) a learning-based autonomy algorithm generator and (2) a multi-objectivealgorithm-hardware tuner. The algorithm generator focuseson generating a functionally correct learning-based autonomyalgorithm. AutoPilot automatically trains and tests the neuralnetwork-based autonomy algorithm for a given aerial robot taskusing deep reinforcement learning (RL) [14]. The tuner usesmulti-objective Bayesian optimization [15] to automaticallytune the learning-based autonomy algorithms hyperparametersand the hardware accelerator parameters simultaneously tomeet the optimization target (e.g., high safe velocity, lowermission energy) speciﬁed in the high-level speciﬁcation. a r X i v : . [ c s . R O ] F e b a) Autonomous drone components. (b)

Throughput of sensor, compute, controller sub-system.

Fig. 1: (a) Robot components interactions, paradigms for achieving autonomy, and low-level ﬂight controller. (b) The action throughput ofthe sense-compute-control pipeline depends upon the throughput of sensor, compute and controller sub-systems [16], [17].

We use AutoPilot to automatically generate the Pareto-optimal design points for aerial robot navigation tasks. Weshow AutoPilot’s ability to generate these design points forthree different target drones platforms (mini-UAV, micro-UAV,and nano-UAV) with sensor framerate of 30 FPS and 60FPS. We show that AutoPilot’s generated optimal design pointachieves up to 2 × , 1.54 × , and 1.81 × lower mission energyfor mini-UAV, micro-UAV, and nano-UAV compared to usingcommercially off-the-shelf accelerators (Nvidia TX2) or othernon-optimal design points generated by AutoPilot. Our resultsshow the importance of cyber-physical co-design , as opposedto the ad-hoc stand-alone design of the onboard computingplatforms and its implication of selecting optimal design pointson mission time and mission energy.In summary, we make the following contributions:1) F-1, a visual performance model to understand the role ofa computing platform in aerial robots while consideringother components such as sensor and body dynamics.2) AutoPilot, an intelligent cyber-physical design spaceexploration framework that allows us to automaticallyco-design a learning-based control algorithm with theaccelerator from a high-level user speciﬁcation.3) Exploit cyber-physical co-design to maximize safe ﬂyingvelocity while minimizing the overall mission energy. II. Autonomous Aerial Robot Background

This section provides a background on the key componentsin aerial robots, the role of ﬂight controller,and brief overviewof the two control algorithm paradigms, namely the “Sense-Act-Plan” and “End-to-End learning”.

A. Aerial Robot Components

Autonomous aerial robots typically have three key com-ponents, namely rotors, sensors, and an onboard computingplatform. Rotors determine the thrust that the aerial robot cangenerate. The sensor (e.g., camera) allows the aerial robotto sense the environment. The onboard compute executes theautonomy algorithm to process the sensor data. The size of anaerial machine plays an important role in component selection.

B. Flight Controller

The task of ﬂight controller is stabilization and control of theaerial robots. It is designed in a multi-level hierarchical fashionand is realized using PID controllers. The ﬂight controllerﬁrmware stack is computationally light and is typically runon the microcontrollers [18], [19]. To stabilize the drone fromunpredictable errors (sudden winds or damaged rotors), the inner-loop typically runs at closed-loop frequencies of up to 1kHz [20], [21].

C. Onboard Compute

In addition to the ﬂight controller, there is a separateand dedicated computer responsible for generating high-levelactions from various autonomy algorithms (which we describelater in Section II-D). In nano-UAVs, due to their size andweight, typically use microcontrollers as the onboard computingplatform. For example, CrazyFlie [22] weighs less than 27 gand is powered by an ARM Cortex-M4 microcontroller. Onthe other end are mini-UAVs, which are bigger and have ahigher weight (payload capacity). Mini-UAV typically usesa general-purpose computing platform such as Intel NUC orNvidia Jetson TX1/TX2. For example, AscTec Pelican [23],which weighs 1.6 Kgs is powered by an Intel NUC platform.

D. Autonomy Algorithms

Autonomous behaviour of aerial robot is achieved byalgorithms that classify into two broad categories, namely“Sense-Plan-Act” and “End-to-End Learning”. In “Sense-Plan-Act,” the algorithm is broken into three or more distinct stages,namely the sensing stage, the planning stage, and the controlstage. In the sensing stage, the sensor data is used to create amap [24]–[26] of the environment. The planning stage [27],[28] processes the map, to determine the best trajectory (e.g.,collision-free). The trajectory information is used by the controlstage, which actuates the rotor, so the robot stays within thetrajectory. The execution time for these algorithms varies fromhundreds of milliseconds to few seconds [11].End-to-End learning methods, which we focus on in thiswork, directly process raw input sensor information (e.g., RGB,Lidar, etc.) and use a neural network model to produce outputactions directly. Unlike the Sense-Plan-Act paradigm, the end-to-end learning methods do not require maps or separateplanning stages and hence are much faster compared to non-NNbased autonomy algorithms [29], [30]. The model can be trainedusing supervised learning [12], [31]–[33] or reinforcementlearning [13], [34], [35].

III. F-1 Performance Model

In this section, we introduce the F-1 visual performancemodel that helps computer architects understand whether arobot’s performance is bottlenecked by the selection (or design)of compute (and autonomy algorithm), or by other componentsin the aerial robot such as sensor or its body-dynamics (lawsof physics). We ﬁrst start with the F-1 model overview to2nderstand it as a performance model, and explain how it canbe useful. Then we describe how we construct the F-1 model.The F-1 model visually resembles that of a traditionalcomputer system rooﬂine model [36], albeit the parameters inthe F-1 model quantiﬁes the aerial-robot as a holistic system asopposed to compute system in isolation. Similar to the rooﬂinemodel, the F-1 model can be used by computer architects intwo ways. First, it can be used as a visual performance modelto understand various bounds and bottlenecks. Second, it canidentify an optimal system (autonomy algorithm + on-boardcompute) for an aerial robot.

A. Need for a Cyber-Physical Performance Model

The rate at which motion decisions are made in a dronedepends on the speeds of components within the sensor-compute-control pipeline (Fig. 1b): the sensor capturing asnapshot (e.g., image) of the environment, the computerprocessing the sensor data to generate high-level decisions,and the controller realizing the ﬁnal decisions. The slowest ofthe sensor, compute, and control subsystems create the upperbound on the rate at which ﬁnal decisions are generated.The decision-making rate determines how fast an intelligentagent (biological or mechanical) can travel while maintainingmaneuverability. For example, consider the case of a droneﬂying through a crowded obstacle course. While the drone’sresponse time to new stimuli is governed by the total latencyof the entire sensor-compute-control pipeline, the rate at whichthe drone can output motor actions is tied to the maximum throughput of that pipeline (Fig. 1b). As long as the totallatency is fast enough to perceive and track objects in theenvironment (e.g., obstacles, other drones), then the speed withwhich the drone can maneuver through obstacles with agilityis limited by the rate at which valid decision actions can beoutput by the pipeline (i.e., the throughput).Our insight is that this problem resembles the canonical rate-matching problem in computer systems. Computer architectsare familiar with how to model this using analytical model suchas bottleneck analysis [16], Rooﬂine [36], and Gables [17].However, to achieve high-speed agility for drones, one mustalso consider the effect of physical quantities (governed byphysics) and how it affects the selection of sensor, compute,and control subsystems. The traditional computer architecturemodels fall short of capturing these effects. Hence to designan agile high-speed drone, one must factor in both the physicalquantities and rate matching of individual subsystems.The F-1 model uniﬁes the parameters determining thedecision-making rate and the parameters that determine thedrone’s physics to realize agile high-speed velocity effectively.

B. Using the F-1 Model

An F-1 model deﬁnes the upper-bound on the safe velocity,considering the maximum rate at which the drone’s sensor-compute-control pipeline can make a decision. Responsivnesswithin a safe perceptual operating regime is the typical use casefor most drones, and to ensure that the drone stays in that saferegime, it can be programmed to invoke a stopping policy [23], [37], [38]. Our work focuses on operating efﬁciently withinthe safe regime to maximize agile velocity, and thus minimizemission time and battery energy.The F-1 model is a log-scale plot between safe velocity(V sa f e ) on the y -axis and “Action Throughput (f action ),” onthe x -axis (Fig. 2a). The Action throughput is the throughputof the sensor-compute-control pipeline, i.e., the rate at whichdecisions (e.g., move forward, turn left etc) are generated. Safevelocity (V sa f e ) is the deﬁned as the velocity an aerial robotcan travel without colliding with an obstacle. Any speeds lessthan or equal to V sa f e guarantees safety, while any speedsexceeding V sa f e is considered unsafe.The F-1 model shows that a robot’s velocity increases withimproving the throughput of its sense-compute-control pipelineonly up to a point, after which it is independent of the pipeline’sthroughput. We deﬁne the decision-making rate of the robotas f action , and its inverse, the control period, as T action .Because the stages in the sensor-compute-control pipelinecan be run concurrently, we see that the minimum controlperiod of the pipeline can never be smaller than the maximumlatency of each component in the subsystem: max ( T sensor , T compute , T control ) ≤ T action (1)If the stages of the pipeline are not fully overlapped, thesmallest practical control period may approach the total pipelinelatency: T action ≤ T sensor + T compute + T control (2)While the perceptual responsiveness to new stimuli (i.e.,latency) is ﬁxed at the upper bound in Eq. 2, through successfulpipelining, we can output new control actions at a higher rate,approaching the lower bound in Eq. 1. As long as the robot’sperceptual responsiveness is within a safe operating regime, asmentioned earlier, this allows the robot to execute complicatedmaneuvers at a higher traveling velocity – making for a moreagile drone with shorter mission times.The upper-bound on the Action Throughput (f action ) for apipelined scenario can be deﬁned from Eq. 1: f action = min ( T sensor , T compute , T control ) , (3)where, T sensor = 1/f sensor is the latency to sample data from thesensor. If the aerial robot has 60 FPS camera, it means that thesensor data can be sample at 16.67 ms interval, which becomesthe sensor latency. T compute is the latency of the autonomy algorithm to estimatethe high-level action commands. The algorithm running on thecomputing system feeds on the sensor data. Compute through-put is a function of the autonomy algorithm (Section II-D) aswell as the underlying hardware architecture. T control = 1/f control is the latency to generate the low-levelactuation commands. The typical values of f control is upwardsof 1 kHz [21].3 a) Different bounds. (b)

Optimal design. (c)

Effect of a max . (d) Effect of d . Fig. 2: (a) Using F-1 model to understand different bounds namely compute, sensor and body-dynamics bound respectively. Rooﬂine isdetermined by the body-dynamics bound and the compute and sensor bounds add ceilings to the F-1 model. (b) Determining optimal designusing F-1 model. (c) Changing a max leads to new rooﬂines in the F-1 model. (d) Changing the sensor range ( d ) leads to new rooﬂines. With these terms in place, the F-1 visual performance modelcan be used to perform a bound-and-bottleneck analysis todetermine if the safe velocity is affected by the onboardsensor/compute. Any point to the left of the “knee-point” in F-1(Fig. 2a) denotes that the safe velocity is bounded by the choiceof compute (and autonomy algorithms) or sensor and any pointto the right of the knee-point denotes the velocity is boundedby body-dynamics of the aerial robot. Ideally, to achieve theoptimal pipeline design, it’s action throughput should be equalto that of the knee-point.

Body-Dynamics Bound.

An aerial robot’s physical propertiessuch as weight, thrust produced by its rotors will determinehow fast it can move and hence the ultimate bound on the safevelocity (V sa f e ) will be determined by its body-dynamics. Wecall the region to the right of the knee-point (i.e., when sense-to-act throughput is greater than or equal to f k ) as body-dynamicsbound. In this region, unless the physical components areimproved (e.g., increasing thrust-to-weight ratio), the velocitycannot exceed the current peak safe velocity no matter how fasta decision is made (i.e., selection of faster compute/sensor).

Sensor Bound.

The choice of onboard sensors limits thedecision-making rate (f action ) which can limit the safe velocity(V sa f e ). As shown in Fig. 2a, a robot’s velocity is sensor-bound if its action throughput is equal to the sensor’s frame rate( f sensor ) but less than the knee-point throughput ( f k ). The sensor-bound case occurs when the compute throughput (f compute ) isless than or equal to the sensor throughput (f sensor ) (i.e. actionthroughput is equal to f sensor according to Equation 3), and f sensor < f k . In this scenario, the sensor adds a new ceilingto the F-1 model, thus, bounding the velocity under V s . Inthis region, unless the sensor throughput is improved (e.g.,higher FPS sensor), the velocity cannot exceed the sensor-bound ceiling ( V s ) no matter how fast onboard compute canprocess the sensor input. Compute Bound.

The choice of onboard compute (orautonomy algorithm) also affects the decision making rate(f action ). As also shown in Fig. 2a, a robot’s velocity is compute-bound if its action throughput ( f c ) is less than the sensor’sframe rate ( f s ) and the knee-point throughput ( f k ). In this case,the computing platform adds a new ceiling to the rooﬂinemodel, bounding the velocity under this limit ( V c ). In thisscenario, the sensor adds a new ceiling to the F-1 model,thus, bounding the velocity under V s . In this region, unless thecompute performance is improved (e.g., hardware accelerators/ algorithm-hardware co-design) the velocity cannot exceed V c . Optimal Design.

The F-1 model can identify system designsthat achieve an optimal/balanced overall system capability.Fig. 2b shows how understanding the bounds on safe velocityusing F-1 can help designing an optimal system for aerialrobots. For a given robot with ﬁxed mechanical properties,changing the sensor type or onboard compute impacts thef action . Consequently, the optimal design point is when thesensor throughput and compute throughput result in a actionthroughput that is equal to the knee-point throughput ( f k ). Over-Optimal Design.

If the action throughput is f over suchthat f over > f k , then either the sensor/computer is over-optimized since any value greater than f k yields no improvementin the velocity of the aerial robot. Such an over-designedcomputing/sensor platform involves not only extra optimizationeffort but also burns additional power that further increasesthe drone’s total power, decreasing its overall battery life. Sub-Optimal Design.

The F-1 model can help architectsunderstand the performance gap between the current computedesign and optimal design. For instance, if the action throughputis f sub , such that f sub < f k , then the sensor/computer isunder-optimized, which signiﬁes that current system if offby ( f sub − f k ) and there is scope for improvement througha better algorithm or selection (or design) of the computingsystem. C. Constructing the F-1 Model

In this section, we describe how we construct the F-1 modelstarting from prior work [23] that has established and validatedthe relationship between the cyber-physical parameters and thesafe velocity of the aerial robot as described by Eq. 4.Eq.4 states that if the robot’s body-dynamics (physics) canpermit it to accelerate at most by a max , its compute and sensorspermit it to sense and act at an interval of T action (1/f action ),and its sensor(s) can sense the environment as far as ‘d’ meters,then robot can travel as fast V sa f e .For instance, Fig. 3 depicts an aerial robot with its ﬁeld ofview (FoV) [39] and an obstacle (e.g., tree or a bird) withinthe FoV. FoV is the region that the sensor can observe in anenvironment. In this scenario, the aerial robot can travel atmost by V sa f e and stop without colliding with the obstacle.To construct the model, we sweep the T action from 0 → max = 50 m / s ) and the sensor range ( d = 10m), as shown in Fig. 4a.We observe an asymptotic relation between velocity and T action safe = a max ( (cid:114) T action + d a max − T action ) (4) Fig. 3:

Maximum safe velocity for a given aerial robot. such that as T action →

0, the velocity →

32 (as seen in themagniﬁed portion of Fig. 4a). Likewise, as the T action → ∞ ,the velocity →

0. We also plot the f action (inverse of T action ) onthe x -axis and velocity on the y -axis in Fig. 4b. Both the x -axisand y -axis are plotted on a linear scale. As T action decreases(or 1/T action increases), there is a sudden transition in velocity(0 to 31 m/s) and saturates thereafter.We see that there is a point beyond which increasing f action does not increase the velocity, showing a saturation or a rooﬂine. Fig. 4c, plots the x-axis on log scale. Plotting the x -axis onlog-scale allows to observe the transition that was not evidentin the linear scale (Fig. 4b) or in the orignal CPS relation(Fig. 4a). We also annotate the three plots with two samplepoints denoted as point ‘A’ and ‘knee-point’. The point A hasa f action of 1 Hz while the knee-point has a f action of 100 Hz.Between point A to knee-point denotes 100 × improvement inaction throughput and translates to increase in velocity from 10m/s to 30 m/s. Whereas even 100 × improvement in f action afterthe knee-point results in 1.0004 × improvement in velocity(signifying no improvement in velocity). Hence, increasingthe action throughput (e.g., faster computing platform, fastersensor etc.) beyond a certain point will yield no improvementin the velocity.To visualize the F-1 model (Fig. 2a), we need to show tworegions: (i) where a robot’s velocity depends on f action , and (ii)where the velocity is independent of f action . D. Effects of Cyber and Physical Component Interaction

In this section, we show how the parameters in Eq. 4 couplesthe cyber and physical components interaction in an aerial robot.The cyber components integrate the sensing, computation, andcontrol pipeline in drones. The effect of cyber components canbe abstracted by the T action (1/f action ) in Eq. 4.The physical components in an aerial robot, such as the massof sensor/compute/body frame/battery, the thrust-to-weightratio, the aerodynamic effects such as drag [40], sensing qualityetc can be abstracted by the a max and d parameters in Eq 4.The three parameters (T action , a max , d ) in Eq. 4, can beused to capture overheads of improving safety, reliability, andredundancy. For instance, safety of autonomous vehicles can beimproved by increasing its FOV [39] (i.e., reducing the blindspot) [41], or designing better tracking algorithms [42]–[44]and/or adding redundancy in compute [45], [46].The a max parameter captures the physical effects of addingpayload (sensor, onboard compute, battery, etc.) to the aerialrobot. The payload weight affects the thrust-to-weight [47]ratio which lowers the a max [48]. The F-1 model captures theimpact of varying a max on V sa f e : a higher a max leads to a higherV sa f e (with rooﬂine shifting upwards), as shown in Fig. 2c. (a) CPS Model. (b)

Linear scale. (c)

Log scale.

Fig. 4:

The CPS relationship and the rooﬂine model.

The d parameter captures the sensing quality of the aerialrobot. For instance, a laser based sensor can provide a highersensing range, whereas a camera array based depth sensor hasa limited range [49]. The F-1 model captures the impact ofvarying d on V sa f e : a higher d leads to higher V sa f e (withrooﬂine and slope shifting upwards), as shown in Fig. 2d.Lastly, the f action parameter captures the effect of sensorframerate, improvements to autonomy algorithm, or onboardcompute. The additional latency incurred due to extra sen-sor/computation (e.g., sensor-fusion) affects the f action basedon Eq. 3. The F-1 model captures the impact of varying f action by adding new ceilings which will limit the V sa f e .In summary, Eq. 4 couples the cyber and physical compo-nents and its associated effects into a single relationship. ThusF-1 model which is built based on Eq. 4 provides a uniﬁedperformance model for computer architects to design onboardcompute while taking into account the cyber-physical effects. E. Validation and Generalizability

The F-1 model is derived by plotting the CPS relationshipbetween safe velocity (V sa f e ) and throughput (f action ).The CPS relationship is validated on different environmentswith varying number of obstacles density in both simulationas well as on a real-world with wind speeds up to 7 m/s ona quadcopter. The F-1 model applies to both the autonomyalgorithm paradigms (Section II-D) and quadrotors of alldifferent sizes. As we show later, it is useful for nano, microand mini UAVs analysis.

IV. F-1 Analysis of Off-the-shelf Compute

We use F-1 to characterize the performance of commonly-used learning-based autonomy algorithms running on real-world computing platforms that are used in aerial robots. Weshow that commonly-used autonomy algorithms and hardwareplatforms do not lead to optimal robot velocity, indicatingthat the choice of the (1) onboard computing platform and (2)autonomy algorithm affect the safe maximum velocity of therobot, thus conﬁrming the need for cyber-physical co-design.We consider a baseline aerial robot that has a thrust-to-weight ratio of 2.4 [50], equipped with a camera sensor of 60FPS, and weighs 1350 g , including the weight of the sensor,body frame, and battery. The robot is human teleoperated; itcomes with a micro-controller unit but has limited computingand memory capacity for autonomy algorithms other than theﬂight controller stack. Since this onboard compute system doesnot use a hardware accelerator, we refer to this baseline as“ No-Acc ”. Such a robot can achieve a max acceleration of 15.95 m / s . This is annotated as “Body Roof” in Fig. 5. The verticalred dotted line in Fig. 5 denotes the sensor throughput (f s ).5e augment the baseline robot conﬁguration with fourdifferent off-the-shelf accelerators that have varying computecapabilities: Nvidia Xavier, Nvidia TX2, Intel NCS, and Ras-Pi 3b. These systems are selected as they are used in realaerial robots [51]–[54]. Therefore, in addition to the “No-Acc” baseline, we create four other robot conﬁgurations: eachusing a different accelerator, while the rest of the mechanicalparameters (e.g., sensor) remain the same as the “No-Acc”baseline. Two autonomy algorithms that have been used foraerial robots in prior works are selected to run on these fourconﬁgurations: VGG-16 [13] and DroNet [12]. Compute is heavy, and weighs down the aerial robot’sagility.

High-performance onboard compute can process theautonomy algorithms faster but it trades off performance withhigher TDP and weight which in turn lowers the maximumacceleration (a max ). Table I shows the maximum acceleration foreach of the four robot conﬁgurations when using the differentaccelerator-based computing platforms. Since Xavier (high-performance ⇒ high-TDP ⇒ larger heat-sink) is the heaviestof the four, it shows the lowest acceleration, while Rasp-Pi/IntelNCS (low performance ⇒ low-power ⇒ lighter heat-sink)achieve the highest. However, these peak acceleration valuesare still lower than the “No-Acc” baseline acceleration of 16 m / s , thus implying it is important to consider the effect ofcompute weight on a robot’s max acceleration. High-Performance compute does not imply a high-performance aerial robot.

High-performance onboard com-pute platform does not always translate to higher robotperformance (e.g., velocity or mission-energy etc). For instance,Fig. 5a shows running DroNet on four different onboardcompute platforms. In this case, low-performance NCS canachieve higher velocity compared to the high-performance TX2and Xavier as shown by their rooﬂines. This is because bothTX2 and Xavier has higher-TDP thus has higher heat-sinkweight which lowers the maximum acceleration which in turnlowers the velocity. In the case of NCS, it is overdesigned forthe performance but a lower power (such that f action is to theright of its knee-point) thus achieves higher velocity by beinglighter(compared to TX2 and Xavier). However, in the case ofRas-Pi, even though it is lighter compared to TX2 and Xavier,its performance is lower (f action left of the knee-point) thusmaking it compute-bound which lowers the velocity.

Computationally-intensive algorithms need high-performance compute.

Fig. 5b shows ceilings for theplatforms (Ras-Pi 3b runs out of memory for VGG-16). Theaction throughput of Xavier, TX2, and NCS are dominatedby their compute latencies as they are higher than the sensorlatency. Xavier achieves higher action throughput of 28 Hzcompared to TX2 (10 Hz) and NCS (1.3 Hz). For Xavier,TX2, and NCS, the velocity is bounded by compute as its

Platform TDP(W) Weight(g) HeatsinkWeight(g) Aerial RobotBase Weight(g) MaximumAcceleration(m/s ) Control Algorithm No-Acc < ∼ Xavier <

30W 280 [55] 162 1350 ∼ TX2 <

15W 85 [56] 81 1350 ∼ Ras-Pi < ∼ Intel NCS <

1W 42 [58] 5.4 1350 ∼ TABLE I:

Targeted computing platforms/control algorithms. (a)

DroNet. (b)

VGG-16.

Body RoofXavierTX2NCSRasPi(a=15.95 m/s )(a=11.58 m/s )(a=14.40 m/s )(a=15.10 m/s )(a=15.60 m/s )AscTec Pelican Drone I n c r ea s i n g acce l e r a t i o n Fig. 5:

F-1 rooﬂine plots for two end-to-end learning models runningon Nvidia Xavier/TX2, Intel NCS, and Ras-Pi.

C.T. is the throughputof running an algorithm on a hardware, shown only when it is greaterthan f s (denoted by veritical red-line) (action throughput in thesecases is equal to f s ). action throughput is to the left of to its rooﬂine’s knee point.However, Xavier is the least compute-bound among theseaccelerators since its action throughput is closest (within3.5%) to its rooﬂine’s knee-point. As a result, Xavier achievesa higher max velocity than other accelerators. However, it isstill not an optimal choice of compute as its velocity (9.56m/s) is far from the baseline No-Acc max velocity (11.64 m/s)due to its weight. Takeaway.

While high performance ensures that velocity isnot compute-bound, low power dissipation translates in lowerweight (smaller heatsink), hence able to support higher a max (higher rooﬂine). Given that the action throughput of thesecommonly-used autonomy algorithms and computing platformsare not optimal, we need algorithm-hardware co-design toachieve design points close to the knee-point.

V. AutoPilot

Our F-1 analysis motivates the need to determine the bestplatform (i.e., autonomy algorithm and accelerator design) thatwill result in a knee-point action throughput while consideringdrone body dynamics and sensor type. To this end, we introducethe

AutoPilot cyber-physical co-design framework. For a givenrobot’s high-level speciﬁcation such as its thrust-to-weight ratio,sensor type, target task/environment, the tool automaticallyﬁnds the optimal NN policy and its accelerator to ensure robustnavigation and maximize safe velocity.AutoPilot is made up of three phases (Fig. 6). Phase 1of AutoPilot takes an input speciﬁcation of the robot andtrains various Neural Network (NN) policies for a giventask/environment and measures the effectiveness of thesepolicies in terms of success rate. Phase 2 performs an automateddesign space exploration (DSE) to ﬁnd the candidate NNpolicies and accelerator architectures that are optimal in termsof success rate and hardware power/performance. Phase 3 thenuses the F-1 performance model to ﬁnd the NN policy and itsaccelerator design, from the various candidates from phase 2,that maximize the velocity and success rate.

A. Phase 1: Speciﬁcation and Training

In Phase 1, the user provides an input

Speciﬁcation andconﬁgures the NN training environment via the

Air Learning

NN training gym. The speciﬁcation consists of all the inputs tothe AutoPilot framework, such as the robot task, environment,optimization target (velocity), robot’s physical properties, etc.6 peciﬁcation • Success Rate > 90• Sensor Frame rate:[30,60] FPS• TDP : [1-10] Watt• Thrust-to-Weight Ratio:[1.5 - 3] • Optimization Target: Velocity

BayesianOptimization Cycle AccurateSimulator

HWParameters

AirLearningAirLearningAirLearning

AirLearningInstance-1

Air Learning Training Task-System ParetoFrontiers

CPS Co-design with F-1 Model xPU

DNNPolicySOCArchitectureOptimal Policy+ Hardware AcceleratorAir Learning DatabaseNN ParametersNN Parameters

ArchitecturalFine-Tuning

Design Space Exploration Engine Knee-PointReached? YesNo

Bag of Arch Optimizations • Frequency Scaling• Technology Scaling