Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots
Srivatsan Krishnan, Zishen Wan, Kshitij Bharadwaj, Paul Whatmough, Aleksandra Faust, Sabrina Neuman, Gu-Yeon Wei, David Brooks, Vijay Janapa Reddi
MMachine Learning-Based Automated DesignSpace Exploration for Autonomous Aerial Robots
Srivatsan Krishnan † , Zishen Wan † , Kshitij Bharadwaj † , Paul Whatmough ∓ , Aleksandra Faust § , Sabrina M. Neuman † ,Gu-Yeon Wei † , David Brooks † , and Vijay Janapa Reddi †† Harvard University ∓ ARM Research § Google Brain Research
Abstract —Building domain-specific architectures for au-tonomous aerial robots is challenging due to a lack of systematicmethodology for designing onboard compute. We introduce anovel performance model called the
F-1 roofline to help architectsunderstand how to build a balanced computing system forautonomous aerial robots considering both its cyber (sensor rate,compute performance) and physical components (body-dynamics)that affect the performance of the machine. We use F-1 to charac-terize commonly used learning-based autonomy algorithms withonboard platforms to demonstrate the need for cyber-physical co-design . To navigate the cyber-physical design space automatically,we subsequently introduce
AutoPilot . This push-button frame-work automates the co-design of cyber-physical components foraerial robots from a high-level specification guided by the F-1 model. AutoPilot uses Bayesian optimization to automaticallyco-design the autonomy algorithm and hardware acceleratorwhile considering various cyber-physical parameters to generatean optimal design under different task level complexities fordifferent robots and sensor framerates. As a result, designsgenerated by AutoPilot, on average, lower mission time up to2 × over baseline approaches, conserving battery energy. I. Introduction
Autonomous robots, such as self-driving cars and aerialrobots, are on the rise [1]–[6]. Building computing systemsfor these domains is challenging because autonomous robotsdiffer from traditional computing systems (embedded systems,servers, etc.) in that the robots must sense the environmentthrough its sensors, make real-time decisions (e.g., detectionand evasion) with the available onboard computing and actuateitself within the environment (e.g., evade an obstacle). Theserobots have cyber components (sensor/compute) and physicalcomponents (such as frames/rotors) that interact with one-another to work as one coherent system. Hence, autonomousrobots are cyber-physical systems (CPS) and the traditionalcomputing platform is just one component among many others.To design the optimal onboard compute we need to do cyber-physical co-design . The selection of the cyber and phys-ical components affects system “performance” (i.e., velocity,mission time, energy) of the aerial robot. For instance, cyberquantities, such as sensor framerate and process rate of thesensor data, determine how fast the aerial robot reacts ina dynamic environment, which in turn, determines the safevelocity. Physical quantities, such as weight (frame, payload),determine if the physics allows it to accelerate and move faster. To perform cyber-physical co-design, we must first under-stand the role of computing (specifically in autonomous aerialrobots), and then we design domain-specific architectures. Tointuitively understand the role of computing in such a cyber-physical system, we introduce the “Formula-1” (F-1) visualperformance model to guide the design of optimal systemsfor a given robot task. F-1 determines which of the cyber-physical components (compute, sensor, body) determines thesafe operating velocity; safe high-speed autonomous navigationremains one of the key challenges in enabling aerial robotapplications [7]–[10]. Safety ensures that the control algorithmis reactive to a dynamic environment, while high-speednavigation ensures that the aerial robot finishes tasks quickly,thereby lowering mission time and energy [11].Using F-1, we show that performant aerial-robot requirescareful co-design of the autonomy algorithm, as well as theunderlying hardware along with the cyber-physical parametersof the aerial robot. We evaluate two popular learning-basedautonomy algorithms, DroNet [12] and Vgg16 (CAD2RL) [13],on computing platforms used in aerial robots, namely, NvidiaXavier, Nvidia TX2, Intel NCS, and Ras-Pi. Our observationsshow that the ad-hoc selection of autonomy algorithms oronboard computing platforms is far from optimal.To efficiently design domain-specific architectures whilebeing cognizant of the cyber-physical parameters, we introduceAutoPilot—an intelligent cyber-physical design space explo-ration framework that uses the F-1 model to automaticallygenerate the optimal autonomy algorithm (learning-based) andits associated hardware accelerator from a high-level user-defined robot task, platform constraints and optimization targetspecifications. AutoPilot consists of two parts: (1) a learning-based autonomy algorithm generator and (2) a multi-objectivealgorithm-hardware tuner. The algorithm generator focuseson generating a functionally correct learning-based autonomyalgorithm. AutoPilot automatically trains and tests the neuralnetwork-based autonomy algorithm for a given aerial robot taskusing deep reinforcement learning (RL) [14]. The tuner usesmulti-objective Bayesian optimization [15] to automaticallytune the learning-based autonomy algorithms hyperparametersand the hardware accelerator parameters simultaneously tomeet the optimization target (e.g., high safe velocity, lowermission energy) specified in the high-level specification. a r X i v : . [ c s . R O ] F e b a) Autonomous drone components. (b)
Throughput of sensor, compute, controller sub-system.
Fig. 1: (a) Robot components interactions, paradigms for achieving autonomy, and low-level flight controller. (b) The action throughput ofthe sense-compute-control pipeline depends upon the throughput of sensor, compute and controller sub-systems [16], [17].
We use AutoPilot to automatically generate the Pareto-optimal design points for aerial robot navigation tasks. Weshow AutoPilot’s ability to generate these design points forthree different target drones platforms (mini-UAV, micro-UAV,and nano-UAV) with sensor framerate of 30 FPS and 60FPS. We show that AutoPilot’s generated optimal design pointachieves up to 2 × , 1.54 × , and 1.81 × lower mission energyfor mini-UAV, micro-UAV, and nano-UAV compared to usingcommercially off-the-shelf accelerators (Nvidia TX2) or othernon-optimal design points generated by AutoPilot. Our resultsshow the importance of cyber-physical co-design , as opposedto the ad-hoc stand-alone design of the onboard computingplatforms and its implication of selecting optimal design pointson mission time and mission energy.In summary, we make the following contributions:1) F-1, a visual performance model to understand the role ofa computing platform in aerial robots while consideringother components such as sensor and body dynamics.2) AutoPilot, an intelligent cyber-physical design spaceexploration framework that allows us to automaticallyco-design a learning-based control algorithm with theaccelerator from a high-level user specification.3) Exploit cyber-physical co-design to maximize safe flyingvelocity while minimizing the overall mission energy. II. Autonomous Aerial Robot Background
This section provides a background on the key componentsin aerial robots, the role of flight controller,and brief overviewof the two control algorithm paradigms, namely the “Sense-Act-Plan” and “End-to-End learning”.
A. Aerial Robot Components
Autonomous aerial robots typically have three key com-ponents, namely rotors, sensors, and an onboard computingplatform. Rotors determine the thrust that the aerial robot cangenerate. The sensor (e.g., camera) allows the aerial robotto sense the environment. The onboard compute executes theautonomy algorithm to process the sensor data. The size of anaerial machine plays an important role in component selection.
B. Flight Controller
The task of flight controller is stabilization and control of theaerial robots. It is designed in a multi-level hierarchical fashionand is realized using PID controllers. The flight controllerfirmware stack is computationally light and is typically runon the microcontrollers [18], [19]. To stabilize the drone fromunpredictable errors (sudden winds or damaged rotors), the inner-loop typically runs at closed-loop frequencies of up to 1kHz [20], [21].
C. Onboard Compute
In addition to the flight controller, there is a separateand dedicated computer responsible for generating high-levelactions from various autonomy algorithms (which we describelater in Section II-D). In nano-UAVs, due to their size andweight, typically use microcontrollers as the onboard computingplatform. For example, CrazyFlie [22] weighs less than 27 gand is powered by an ARM Cortex-M4 microcontroller. Onthe other end are mini-UAVs, which are bigger and have ahigher weight (payload capacity). Mini-UAV typically usesa general-purpose computing platform such as Intel NUC orNvidia Jetson TX1/TX2. For example, AscTec Pelican [23],which weighs 1.6 Kgs is powered by an Intel NUC platform.
D. Autonomy Algorithms
Autonomous behaviour of aerial robot is achieved byalgorithms that classify into two broad categories, namely“Sense-Plan-Act” and “End-to-End Learning”. In “Sense-Plan-Act,” the algorithm is broken into three or more distinct stages,namely the sensing stage, the planning stage, and the controlstage. In the sensing stage, the sensor data is used to create amap [24]–[26] of the environment. The planning stage [27],[28] processes the map, to determine the best trajectory (e.g.,collision-free). The trajectory information is used by the controlstage, which actuates the rotor, so the robot stays within thetrajectory. The execution time for these algorithms varies fromhundreds of milliseconds to few seconds [11].End-to-End learning methods, which we focus on in thiswork, directly process raw input sensor information (e.g., RGB,Lidar, etc.) and use a neural network model to produce outputactions directly. Unlike the Sense-Plan-Act paradigm, the end-to-end learning methods do not require maps or separateplanning stages and hence are much faster compared to non-NNbased autonomy algorithms [29], [30]. The model can be trainedusing supervised learning [12], [31]–[33] or reinforcementlearning [13], [34], [35].
III. F-1 Performance Model
In this section, we introduce the F-1 visual performancemodel that helps computer architects understand whether arobot’s performance is bottlenecked by the selection (or design)of compute (and autonomy algorithm), or by other componentsin the aerial robot such as sensor or its body-dynamics (lawsof physics). We first start with the F-1 model overview to2nderstand it as a performance model, and explain how it canbe useful. Then we describe how we construct the F-1 model.The F-1 model visually resembles that of a traditionalcomputer system roofline model [36], albeit the parameters inthe F-1 model quantifies the aerial-robot as a holistic system asopposed to compute system in isolation. Similar to the rooflinemodel, the F-1 model can be used by computer architects intwo ways. First, it can be used as a visual performance modelto understand various bounds and bottlenecks. Second, it canidentify an optimal system (autonomy algorithm + on-boardcompute) for an aerial robot.
A. Need for a Cyber-Physical Performance Model
The rate at which motion decisions are made in a dronedepends on the speeds of components within the sensor-compute-control pipeline (Fig. 1b): the sensor capturing asnapshot (e.g., image) of the environment, the computerprocessing the sensor data to generate high-level decisions,and the controller realizing the final decisions. The slowest ofthe sensor, compute, and control subsystems create the upperbound on the rate at which final decisions are generated.The decision-making rate determines how fast an intelligentagent (biological or mechanical) can travel while maintainingmaneuverability. For example, consider the case of a droneflying through a crowded obstacle course. While the drone’sresponse time to new stimuli is governed by the total latencyof the entire sensor-compute-control pipeline, the rate at whichthe drone can output motor actions is tied to the maximum throughput of that pipeline (Fig. 1b). As long as the totallatency is fast enough to perceive and track objects in theenvironment (e.g., obstacles, other drones), then the speed withwhich the drone can maneuver through obstacles with agilityis limited by the rate at which valid decision actions can beoutput by the pipeline (i.e., the throughput).Our insight is that this problem resembles the canonical rate-matching problem in computer systems. Computer architectsare familiar with how to model this using analytical model suchas bottleneck analysis [16], Roofline [36], and Gables [17].However, to achieve high-speed agility for drones, one mustalso consider the effect of physical quantities (governed byphysics) and how it affects the selection of sensor, compute,and control subsystems. The traditional computer architecturemodels fall short of capturing these effects. Hence to designan agile high-speed drone, one must factor in both the physicalquantities and rate matching of individual subsystems.The F-1 model unifies the parameters determining thedecision-making rate and the parameters that determine thedrone’s physics to realize agile high-speed velocity effectively.
B. Using the F-1 Model
An F-1 model defines the upper-bound on the safe velocity,considering the maximum rate at which the drone’s sensor-compute-control pipeline can make a decision. Responsivnesswithin a safe perceptual operating regime is the typical use casefor most drones, and to ensure that the drone stays in that saferegime, it can be programmed to invoke a stopping policy [23], [37], [38]. Our work focuses on operating efficiently withinthe safe regime to maximize agile velocity, and thus minimizemission time and battery energy.The F-1 model is a log-scale plot between safe velocity(V sa f e ) on the y -axis and “Action Throughput (f action ),” onthe x -axis (Fig. 2a). The Action throughput is the throughputof the sensor-compute-control pipeline, i.e., the rate at whichdecisions (e.g., move forward, turn left etc) are generated. Safevelocity (V sa f e ) is the defined as the velocity an aerial robotcan travel without colliding with an obstacle. Any speeds lessthan or equal to V sa f e guarantees safety, while any speedsexceeding V sa f e is considered unsafe.The F-1 model shows that a robot’s velocity increases withimproving the throughput of its sense-compute-control pipelineonly up to a point, after which it is independent of the pipeline’sthroughput. We define the decision-making rate of the robotas f action , and its inverse, the control period, as T action .Because the stages in the sensor-compute-control pipelinecan be run concurrently, we see that the minimum controlperiod of the pipeline can never be smaller than the maximumlatency of each component in the subsystem: max ( T sensor , T compute , T control ) ≤ T action (1)If the stages of the pipeline are not fully overlapped, thesmallest practical control period may approach the total pipelinelatency: T action ≤ T sensor + T compute + T control (2)While the perceptual responsiveness to new stimuli (i.e.,latency) is fixed at the upper bound in Eq. 2, through successfulpipelining, we can output new control actions at a higher rate,approaching the lower bound in Eq. 1. As long as the robot’sperceptual responsiveness is within a safe operating regime, asmentioned earlier, this allows the robot to execute complicatedmaneuvers at a higher traveling velocity – making for a moreagile drone with shorter mission times.The upper-bound on the Action Throughput (f action ) for apipelined scenario can be defined from Eq. 1: f action = min ( T sensor , T compute , T control ) , (3)where, T sensor = 1/f sensor is the latency to sample data from thesensor. If the aerial robot has 60 FPS camera, it means that thesensor data can be sample at 16.67 ms interval, which becomesthe sensor latency. T compute is the latency of the autonomy algorithm to estimatethe high-level action commands. The algorithm running on thecomputing system feeds on the sensor data. Compute through-put is a function of the autonomy algorithm (Section II-D) aswell as the underlying hardware architecture. T control = 1/f control is the latency to generate the low-levelactuation commands. The typical values of f control is upwardsof 1 kHz [21].3 a) Different bounds. (b)
Optimal design. (c)
Effect of a max . (d) Effect of d . Fig. 2: (a) Using F-1 model to understand different bounds namely compute, sensor and body-dynamics bound respectively. Roofline isdetermined by the body-dynamics bound and the compute and sensor bounds add ceilings to the F-1 model. (b) Determining optimal designusing F-1 model. (c) Changing a max leads to new rooflines in the F-1 model. (d) Changing the sensor range ( d ) leads to new rooflines. With these terms in place, the F-1 visual performance modelcan be used to perform a bound-and-bottleneck analysis todetermine if the safe velocity is affected by the onboardsensor/compute. Any point to the left of the “knee-point” in F-1(Fig. 2a) denotes that the safe velocity is bounded by the choiceof compute (and autonomy algorithms) or sensor and any pointto the right of the knee-point denotes the velocity is boundedby body-dynamics of the aerial robot. Ideally, to achieve theoptimal pipeline design, it’s action throughput should be equalto that of the knee-point.
Body-Dynamics Bound.
An aerial robot’s physical propertiessuch as weight, thrust produced by its rotors will determinehow fast it can move and hence the ultimate bound on the safevelocity (V sa f e ) will be determined by its body-dynamics. Wecall the region to the right of the knee-point (i.e., when sense-to-act throughput is greater than or equal to f k ) as body-dynamicsbound. In this region, unless the physical components areimproved (e.g., increasing thrust-to-weight ratio), the velocitycannot exceed the current peak safe velocity no matter how fasta decision is made (i.e., selection of faster compute/sensor).
Sensor Bound.
The choice of onboard sensors limits thedecision-making rate (f action ) which can limit the safe velocity(V sa f e ). As shown in Fig. 2a, a robot’s velocity is sensor-bound if its action throughput is equal to the sensor’s frame rate( f sensor ) but less than the knee-point throughput ( f k ). The sensor-bound case occurs when the compute throughput (f compute ) isless than or equal to the sensor throughput (f sensor ) (i.e. actionthroughput is equal to f sensor according to Equation 3), and f sensor < f k . In this scenario, the sensor adds a new ceilingto the F-1 model, thus, bounding the velocity under V s . Inthis region, unless the sensor throughput is improved (e.g.,higher FPS sensor), the velocity cannot exceed the sensor-bound ceiling ( V s ) no matter how fast onboard compute canprocess the sensor input. Compute Bound.
The choice of onboard compute (orautonomy algorithm) also affects the decision making rate(f action ). As also shown in Fig. 2a, a robot’s velocity is compute-bound if its action throughput ( f c ) is less than the sensor’sframe rate ( f s ) and the knee-point throughput ( f k ). In this case,the computing platform adds a new ceiling to the rooflinemodel, bounding the velocity under this limit ( V c ). In thisscenario, the sensor adds a new ceiling to the F-1 model,thus, bounding the velocity under V s . In this region, unless thecompute performance is improved (e.g., hardware accelerators/ algorithm-hardware co-design) the velocity cannot exceed V c . Optimal Design.
The F-1 model can identify system designsthat achieve an optimal/balanced overall system capability.Fig. 2b shows how understanding the bounds on safe velocityusing F-1 can help designing an optimal system for aerialrobots. For a given robot with fixed mechanical properties,changing the sensor type or onboard compute impacts thef action . Consequently, the optimal design point is when thesensor throughput and compute throughput result in a actionthroughput that is equal to the knee-point throughput ( f k ). Over-Optimal Design.
If the action throughput is f over suchthat f over > f k , then either the sensor/computer is over-optimized since any value greater than f k yields no improvementin the velocity of the aerial robot. Such an over-designedcomputing/sensor platform involves not only extra optimizationeffort but also burns additional power that further increasesthe drone’s total power, decreasing its overall battery life. Sub-Optimal Design.
The F-1 model can help architectsunderstand the performance gap between the current computedesign and optimal design. For instance, if the action throughputis f sub , such that f sub < f k , then the sensor/computer isunder-optimized, which signifies that current system if offby ( f sub − f k ) and there is scope for improvement througha better algorithm or selection (or design) of the computingsystem. C. Constructing the F-1 Model
In this section, we describe how we construct the F-1 modelstarting from prior work [23] that has established and validatedthe relationship between the cyber-physical parameters and thesafe velocity of the aerial robot as described by Eq. 4.Eq.4 states that if the robot’s body-dynamics (physics) canpermit it to accelerate at most by a max , its compute and sensorspermit it to sense and act at an interval of T action (1/f action ),and its sensor(s) can sense the environment as far as ‘d’ meters,then robot can travel as fast V sa f e .For instance, Fig. 3 depicts an aerial robot with its field ofview (FoV) [39] and an obstacle (e.g., tree or a bird) withinthe FoV. FoV is the region that the sensor can observe in anenvironment. In this scenario, the aerial robot can travel atmost by V sa f e and stop without colliding with the obstacle.To construct the model, we sweep the T action from 0 → max = 50 m / s ) and the sensor range ( d = 10m), as shown in Fig. 4a.We observe an asymptotic relation between velocity and T action safe = a max ( (cid:114) T action + d a max − T action ) (4) Fig. 3:
Maximum safe velocity for a given aerial robot. such that as T action →
0, the velocity →
32 (as seen in themagnified portion of Fig. 4a). Likewise, as the T action → ∞ ,the velocity →
0. We also plot the f action (inverse of T action ) onthe x -axis and velocity on the y -axis in Fig. 4b. Both the x -axisand y -axis are plotted on a linear scale. As T action decreases(or 1/T action increases), there is a sudden transition in velocity(0 to 31 m/s) and saturates thereafter.We see that there is a point beyond which increasing f action does not increase the velocity, showing a saturation or a roofline. Fig. 4c, plots the x-axis on log scale. Plotting the x -axis onlog-scale allows to observe the transition that was not evidentin the linear scale (Fig. 4b) or in the orignal CPS relation(Fig. 4a). We also annotate the three plots with two samplepoints denoted as point ‘A’ and ‘knee-point’. The point A hasa f action of 1 Hz while the knee-point has a f action of 100 Hz.Between point A to knee-point denotes 100 × improvement inaction throughput and translates to increase in velocity from 10m/s to 30 m/s. Whereas even 100 × improvement in f action afterthe knee-point results in 1.0004 × improvement in velocity(signifying no improvement in velocity). Hence, increasingthe action throughput (e.g., faster computing platform, fastersensor etc.) beyond a certain point will yield no improvementin the velocity.To visualize the F-1 model (Fig. 2a), we need to show tworegions: (i) where a robot’s velocity depends on f action , and (ii)where the velocity is independent of f action . D. Effects of Cyber and Physical Component Interaction
In this section, we show how the parameters in Eq. 4 couplesthe cyber and physical components interaction in an aerial robot.The cyber components integrate the sensing, computation, andcontrol pipeline in drones. The effect of cyber components canbe abstracted by the T action (1/f action ) in Eq. 4.The physical components in an aerial robot, such as the massof sensor/compute/body frame/battery, the thrust-to-weightratio, the aerodynamic effects such as drag [40], sensing qualityetc can be abstracted by the a max and d parameters in Eq 4.The three parameters (T action , a max , d ) in Eq. 4, can beused to capture overheads of improving safety, reliability, andredundancy. For instance, safety of autonomous vehicles can beimproved by increasing its FOV [39] (i.e., reducing the blindspot) [41], or designing better tracking algorithms [42]–[44]and/or adding redundancy in compute [45], [46].The a max parameter captures the physical effects of addingpayload (sensor, onboard compute, battery, etc.) to the aerialrobot. The payload weight affects the thrust-to-weight [47]ratio which lowers the a max [48]. The F-1 model captures theimpact of varying a max on V sa f e : a higher a max leads to a higherV sa f e (with roofline shifting upwards), as shown in Fig. 2c. (a) CPS Model. (b)
Linear scale. (c)
Log scale.
Fig. 4:
The CPS relationship and the roofline model.
The d parameter captures the sensing quality of the aerialrobot. For instance, a laser based sensor can provide a highersensing range, whereas a camera array based depth sensor hasa limited range [49]. The F-1 model captures the impact ofvarying d on V sa f e : a higher d leads to higher V sa f e (withroofline and slope shifting upwards), as shown in Fig. 2d.Lastly, the f action parameter captures the effect of sensorframerate, improvements to autonomy algorithm, or onboardcompute. The additional latency incurred due to extra sen-sor/computation (e.g., sensor-fusion) affects the f action basedon Eq. 3. The F-1 model captures the impact of varying f action by adding new ceilings which will limit the V sa f e .In summary, Eq. 4 couples the cyber and physical compo-nents and its associated effects into a single relationship. ThusF-1 model which is built based on Eq. 4 provides a unifiedperformance model for computer architects to design onboardcompute while taking into account the cyber-physical effects. E. Validation and Generalizability
The F-1 model is derived by plotting the CPS relationshipbetween safe velocity (V sa f e ) and throughput (f action ).The CPS relationship is validated on different environmentswith varying number of obstacles density in both simulationas well as on a real-world with wind speeds up to 7 m/s ona quadcopter. The F-1 model applies to both the autonomyalgorithm paradigms (Section II-D) and quadrotors of alldifferent sizes. As we show later, it is useful for nano, microand mini UAVs analysis.
IV. F-1 Analysis of Off-the-shelf Compute
We use F-1 to characterize the performance of commonly-used learning-based autonomy algorithms running on real-world computing platforms that are used in aerial robots. Weshow that commonly-used autonomy algorithms and hardwareplatforms do not lead to optimal robot velocity, indicatingthat the choice of the (1) onboard computing platform and (2)autonomy algorithm affect the safe maximum velocity of therobot, thus confirming the need for cyber-physical co-design.We consider a baseline aerial robot that has a thrust-to-weight ratio of 2.4 [50], equipped with a camera sensor of 60FPS, and weighs 1350 g , including the weight of the sensor,body frame, and battery. The robot is human teleoperated; itcomes with a micro-controller unit but has limited computingand memory capacity for autonomy algorithms other than theflight controller stack. Since this onboard compute system doesnot use a hardware accelerator, we refer to this baseline as“ No-Acc ”. Such a robot can achieve a max acceleration of 15.95 m / s . This is annotated as “Body Roof” in Fig. 5. The verticalred dotted line in Fig. 5 denotes the sensor throughput (f s ).5e augment the baseline robot configuration with fourdifferent off-the-shelf accelerators that have varying computecapabilities: Nvidia Xavier, Nvidia TX2, Intel NCS, and Ras-Pi 3b. These systems are selected as they are used in realaerial robots [51]–[54]. Therefore, in addition to the “No-Acc” baseline, we create four other robot configurations: eachusing a different accelerator, while the rest of the mechanicalparameters (e.g., sensor) remain the same as the “No-Acc”baseline. Two autonomy algorithms that have been used foraerial robots in prior works are selected to run on these fourconfigurations: VGG-16 [13] and DroNet [12]. Compute is heavy, and weighs down the aerial robot’sagility.
High-performance onboard compute can process theautonomy algorithms faster but it trades off performance withhigher TDP and weight which in turn lowers the maximumacceleration (a max ). Table I shows the maximum acceleration foreach of the four robot configurations when using the differentaccelerator-based computing platforms. Since Xavier (high-performance ⇒ high-TDP ⇒ larger heat-sink) is the heaviestof the four, it shows the lowest acceleration, while Rasp-Pi/IntelNCS (low performance ⇒ low-power ⇒ lighter heat-sink)achieve the highest. However, these peak acceleration valuesare still lower than the “No-Acc” baseline acceleration of 16 m / s , thus implying it is important to consider the effect ofcompute weight on a robot’s max acceleration. High-Performance compute does not imply a high-performance aerial robot.
High-performance onboard com-pute platform does not always translate to higher robotperformance (e.g., velocity or mission-energy etc). For instance,Fig. 5a shows running DroNet on four different onboardcompute platforms. In this case, low-performance NCS canachieve higher velocity compared to the high-performance TX2and Xavier as shown by their rooflines. This is because bothTX2 and Xavier has higher-TDP thus has higher heat-sinkweight which lowers the maximum acceleration which in turnlowers the velocity. In the case of NCS, it is overdesigned forthe performance but a lower power (such that f action is to theright of its knee-point) thus achieves higher velocity by beinglighter(compared to TX2 and Xavier). However, in the case ofRas-Pi, even though it is lighter compared to TX2 and Xavier,its performance is lower (f action left of the knee-point) thusmaking it compute-bound which lowers the velocity.
Computationally-intensive algorithms need high-performance compute.
Fig. 5b shows ceilings for theplatforms (Ras-Pi 3b runs out of memory for VGG-16). Theaction throughput of Xavier, TX2, and NCS are dominatedby their compute latencies as they are higher than the sensorlatency. Xavier achieves higher action throughput of 28 Hzcompared to TX2 (10 Hz) and NCS (1.3 Hz). For Xavier,TX2, and NCS, the velocity is bounded by compute as its
Platform TDP(W) Weight(g) HeatsinkWeight(g) Aerial RobotBase Weight(g) MaximumAcceleration(m/s ) Control Algorithm No-Acc < ∼ Xavier <
30W 280 [55] 162 1350 ∼ TX2 <
15W 85 [56] 81 1350 ∼ Ras-Pi < ∼ Intel NCS <
1W 42 [58] 5.4 1350 ∼ TABLE I:
Targeted computing platforms/control algorithms. (a)
DroNet. (b)
VGG-16.
Body RoofXavierTX2NCSRasPi(a=15.95 m/s )(a=11.58 m/s )(a=14.40 m/s )(a=15.10 m/s )(a=15.60 m/s )AscTec Pelican Drone I n c r ea s i n g acce l e r a t i o n Fig. 5:
F-1 roofline plots for two end-to-end learning models runningon Nvidia Xavier/TX2, Intel NCS, and Ras-Pi.
C.T. is the throughputof running an algorithm on a hardware, shown only when it is greaterthan f s (denoted by veritical red-line) (action throughput in thesecases is equal to f s ). action throughput is to the left of to its roofline’s knee point.However, Xavier is the least compute-bound among theseaccelerators since its action throughput is closest (within3.5%) to its roofline’s knee-point. As a result, Xavier achievesa higher max velocity than other accelerators. However, it isstill not an optimal choice of compute as its velocity (9.56m/s) is far from the baseline No-Acc max velocity (11.64 m/s)due to its weight. Takeaway.
While high performance ensures that velocity isnot compute-bound, low power dissipation translates in lowerweight (smaller heatsink), hence able to support higher a max (higher roofline). Given that the action throughput of thesecommonly-used autonomy algorithms and computing platformsare not optimal, we need algorithm-hardware co-design toachieve design points close to the knee-point.
V. AutoPilot
Our F-1 analysis motivates the need to determine the bestplatform (i.e., autonomy algorithm and accelerator design) thatwill result in a knee-point action throughput while consideringdrone body dynamics and sensor type. To this end, we introducethe
AutoPilot cyber-physical co-design framework. For a givenrobot’s high-level specification such as its thrust-to-weight ratio,sensor type, target task/environment, the tool automaticallyfinds the optimal NN policy and its accelerator to ensure robustnavigation and maximize safe velocity.AutoPilot is made up of three phases (Fig. 6). Phase 1of AutoPilot takes an input specification of the robot andtrains various Neural Network (NN) policies for a giventask/environment and measures the effectiveness of thesepolicies in terms of success rate. Phase 2 performs an automateddesign space exploration (DSE) to find the candidate NNpolicies and accelerator architectures that are optimal in termsof success rate and hardware power/performance. Phase 3 thenuses the F-1 performance model to find the NN policy and itsaccelerator design, from the various candidates from phase 2,that maximize the velocity and success rate.
A. Phase 1: Specification and Training
In Phase 1, the user provides an input
Specification andconfigures the NN training environment via the
Air Learning
NN training gym. The specification consists of all the inputs tothe AutoPilot framework, such as the robot task, environment,optimization target (velocity), robot’s physical properties, etc.6 pecification • Success Rate > 90• Sensor Frame rate:[30,60] FPS• TDP : [1-10] Watt• Thrust-to-Weight Ratio:[1.5 - 3] • Optimization Target: Velocity
BayesianOptimization Cycle AccurateSimulator
HWParameters
AirLearningAirLearningAirLearning
AirLearningInstance-1
Air Learning Training
CPS Co-design with F-1 Model
DNNPolicySOCArchitectureOptimal Policy+ Hardware AcceleratorAir Learning DatabaseNN ParametersNN Parameters
ArchitecturalFine-Tuning
Design Space Exploration Engine Knee-PointReached? YesNo
Bag of Arch Optimizations • Frequency Scaling• Technology Scaling
Compute Weight Modelling
Phase ISection 4.1 Section 4.2 Section 4.3Phase 2 Phase 3 Deployment
Fig. 6:
AutoPilot design methodology for automating cyber-physical co-design in aerial robots.
The Air Learning training simulator [59] is used to traindifferent NN policies for a given environment.
Specification.
There are three main categories within thespecification. The first category is the robot task-level specifi-cation, such as the success rate. The second category includesspecifications about the target CPS system: the sensor framerate,the rigid body-dynamics (thrust-to-weight ratio), power ofrotors/body/sensors, etc. The last category is the optimizationtarget, such as maximize velocity and number of missions,which is used by AutoPilot to determine the final NN policyand the hardware accelerator architecture.
Reinforcement-Learning Training.
AutoPilot uses AirLearning [59] to train and validate learning-based autonomyalgorithms for a given robot task. Air Learning provides a high-quality implementation of reinforcement learning algorithmsthat can be used to train an NN policy for aerial robot naviga-tion tasks.Air Learning includes a configurable environmentgenerator [60] with domain randomization [61] support thatallows changing various parameters such as the number ofobstacles, size of the arena, etc. We customize these parametersto generate different environments, with a varying number ofobstacles, in order to denote the change in the task complexity.To determine the NN policy for each robot task (environmentcomplexity in obstacles, congestion, etc.), we start with thebasic template used in Air Learning [14] and vary its hyperpa-rameters (number of layers/filters) to create many candidateNN policies. Based on the specified robot task, the desiredsuccess rate, AutoPilot launches several Air Learning traininginstances in parallel for the different NN policy candidates.Each of the NN policies that achieve the required success rateis evaluated in a random environment to validate the task levelfunctionality. The validated NN policies are updated into anAir Learning database along with their success rates, whichare then used by Bayesian optimization in the next DSE phase.
B. Phase 2: Design Space Exploration
In Phase 2, an automated multi-objective DSE is performedto find NN policies and hardware accelerator architecturesthat are optimal in terms of success rate and acceleratorperformance/power for a target environment. The success rateis only affected by NN hyper-parameters (e.g., number oflayers/filters). The accelerator’s runtime and power dependon both the NN and accelerator microarchitecture parameters (number of processing elements, on-chip memory, etc.).Success rates for the NN policies are accessed from the AirLearning database, while a cycle-accurate simulator is used toevaluate accelerator performance/power for the different poli-cies and hardware configurations. To achieve rapid convergenceto optimal solutions, without performing an exhaustive search,Bayesian optimization is used to tune the different parameters.
Air Learning Database.
This database stores the trainingresults for the various NN policies trained using Air Learning.Each entry in the database has an NN policy identifier, thehyper-parameters used for training and performance of the NNpolicy validated for a given task. An example of performancemetrics can be the success rate or the number of steps takenby the aerial robot to reach the objective or goal target.
Cycle-Accurate Hardware Simulator.
AutoPilot usesSCALE-Sim, which is a configurable systolic-array based cycle-accurate DNN accelerator simulator [62]. It exposes variousmicro-architectural parameters such as array size (numberof MAC units), array aspect ratio (array height vs. width),scratchpad memory size for the input feature maps (ifmap),filters, and output feature maps (ofmaps), dataflow mappingstrategies, as well as system integration parameters, e.g.,memory bandwidth. Taking these architectural parameters, filterdimensions of each DNN layer, and the image size as input,SCALE-Sim generates the latency, utilization, SRAM accesses,DRAM accesses, and DRAM bandwidth requirement.While SCALE-Sim only generates performance metricsfor the hardware accelerator, we augmented it with powermodels. The SRAM power is estimated using CACTI [63], andthe DRAM power is estimated using Micron’s DDR4 powercalculator [64]. We assume that the accelerator is integratedinto the final SoC. The details about the SoC level integrationand estimation of SoC power are in Section VI.
Bayesian Optimization.
AutoPilot uses Bayesian optimiza-tion [15] for multi-objective DSE to generate task-system Paretofrontiers. Bayesian optimization has been shown to be highlyeffective for optimizing black-box functions [65], [66] that areexpensive to evaluate and cannot be expressed as closed-formexpressions. BayesOpt can achieve faster convergence thangenetic algorithms when optimizing multiple objectives [67].In AutoPilot, BayesOpt optimizes three objective functions:(i) task success rate, (ii) SoC power, and (iii) acceleratorinference latency (runtime). A Pareto-optimal design is one7hat achieves maximum task success rate, and minimuminference latency , and SoC power. The algorithm tunes NNpolicy hyper-parameters (such as number of layers/filters) andaccelerator hardware parameters (e.g., number of processingelements, SRAM sizes, etc.) to converge to Pareto-optimalNN policies and accelerator architectures. An open-sourceBayesOpt implementation [15] is used in AutoPilot.
C. Phase 3: Cyber-Physical Co-Design with F-1
The goal of phase 3 is to find a design point (policy andaccelerator) with optimal success rate and velocity. There aretwo steps involved: CPS co-design and architectural tuning.
CPS Co-Design.
First, designs with the highest successrate (minimum success rate is user-specified) amongst Phase 2generated designs are selected. Then, the velocities for thesedesigns are computed using the CPS relation (Section III-C)that takes into account the effect of the weight of differentcomponents, including the compute, on velocity.Next, the AutoPilot system constructs the F-1 rooflineplot (following sections III-C), which consists of a roofcorresponding to the baseline robot (i.e., human-operated whichdoes not use any onboard NN accelerator), and other roofscorresponding to the velocities of the success-rate-filtereddesigns. The latter roofs would be close or lower than thebase roof due to the added weights of the accelerators. Finally,the design is selected that achieves the max velocity equivalentto the human-operated base robot, and whose action throughputis equal to the base knee-point throughput.
Architectural Fine-Tuning.
In the case when no optimaldesign exists that achieves the base knee-point velocity, somearchitectural tuning may be required to shift the design close tothe knee-point. AutoPilot provides two options for which pointsto consider for optimization: (i) these can be user-defined, or(ii) the design point that is closest to the knee-point can beselected. The architectural tuning can be performed using avariety of optimizations until the optimized design is at (orvery close to) the base knee-point in the F-1 roofline.We employ a bag of architectural optimizations in thetuning process. AutoPilot comes with two techniques, namelyfrequency scaling and technology scaling. In frequency scaling,we increase or decrease the operating frequency to trade-offperformance and power of the hardware accelerator. Loweringthe frequency leads to lower power (TDP), which reducesthe heat-sink weight and increases its a max and velocity.This optimization is useful when a design is body dynamicsbound and is over-designed. Likewise, increasing the operatingfrequency improves accelerator runtime and can be used if adesign is under-optimized and compute-bound. In technologyscaling, we evaluate the designs in different process technologynodes to see if we can move a design closer to the knee-point.
Summary.
AutoPilot methodology is general (ML-basedmulti-objective DSE) and can be extended in scope toinclude other autonomous vehicles such as cars (with itsCPS model [68], [69]), other autonomy algorithms (Sec-tion II-D), hardware targets (e.g., FPGAs, CGRAs, multi-cores,systolic/non-systolic array based, etc.). Within a fixed accelera- tor target, any other architectural optimization technique (e.g.,quantization of policy [70], model compression [71], memoryoptimizations [72]) that trade-off power and performance canbe a part of the bag of architectural optimizations.
VI. Experimental Setup
Air Learning Training Environments.
We generate twoenvironments using the Air Learning environment generatorwith varying degrees of clutter. The arena size is typical and istwice the arena sizes used in aerial robotics testbeds [73]–[76].The NN is trained using Deep Q-Networks [77]. DQN workswell on high-level navigation tasks for aerial robots [78], [79].We use the same reward function and other hyperparametersas used by the authors of Air Learning [59]. The training isterminated after 1 M steps or reaches the required success rate.
NN Policy Architecture Search.
We use the Air Learningmodel architecture as the baseline template and change itshyperparameters. The NN policy is multi-modal, and priorwork [59] has shown that each input modality contributesto the success rate for the task. The basic template of thearchitecture used in that work is shown in Fig. 7a. We madeadditional changes to the base template, such as the choice offilter sizes, strides, etc. We choose filter size of 3 × SoC Power Estimation.
We assume an SoC, which includesan architecture template for hardware accelerator shown inFig. 7b. For estimating the total SoC power, we add the powerof individual components in the SoC. For estimating the powerof the hardware accelerator, we run a given NN policy on acycle-accurate simulator. The cycle-accurate simulator producesSRAM traces, DRAM traces, number of read/write access toSRAM, number of read/write access to the DRAM. Using theSRAM and DRAM trace information, we model the SRAMpower in CACTI [63] and DRAM power in Micron DRAMmodel [64]. For estimating the power for the systolic-array,we multiply the array size with the energy of the PE. The PEpower is modeled after the breakdown in [81].For the ULP camera, we assume the camera is capable ofsustaining frame rates of up to 60 FPS at 144 x 256 sizeimages at low power of less than 100 mW and form factor of6.24 mm × onv ConvDepthImage Conv < Sensor > < Sensor >< Sensor > Flatten FC FC F il t e r s ACTIONS (a)
NN Policy hyperparameters.
PE PE PE PEPE PE PE PEPE PE PE PEIFMAPSRAM FiltersSRAMOFMAPSRAMDRAMIFM RAM Size Filter RAM Size A rr a y R o w Array ColumnOFMAPRAM Size
SRAM Interface S R A M I n t e r f a ce SRAM Interface
System BusInterface
Accelerator Sub-System
ULPCameraCortex-M Cortex-M
Ultra LowPower Core (b)
Accelerator sub-system.
Fig. 7: (a) NN policy tuned with Bayesian optimization. (b) Theaccelerator consists of a systolic array of PEs, on-chip buffers forstoring the input activation (IFMAP SRAM), filter weights (FilterSRAM), and output feature map (OFMAP SRAM). receive the high-level action commands from the acceleratorsub-system through the system bus after running each framethrough the NN policy. The NN produces the action which isinterpreted by the flight controller to generate low-level motoractuation signals to control the aerial robot.
Compute Weight Estimation.
Using the SoC power asthe heat source, we calculate the heat-sink volume requiredbased on heat-sink calculator [87]. The weight of the heat-sinkis determined by multiplying the estimated volume with thedensity of aluminum (commonly used heat-sink material). Wealso assume that the final SoC is mounted on a PCB alongwith all electrical components weighing 20g (which per ouranalysis is typical for Ras-Pi [57], CORAL [88] like systems).
VII. Evaluation
We present the results and analysis associated with AutoPilot(i.e., compute DSE, CPS co-design, and architectural fine-tuning) of AutoPilot. Then we show that the SoCs optimizedfor velocity lead to an increase in the total mission count.
A. Compute Design Space Exploration (DSE)
Since off-the-shelf components fall short of being optimal,we demonstrate that AutoPilot is capable of automaticallyexploring a large design space in finding optimal NN policiesand accelerator designs. We show the system’s ability togenerate a variety of different policies and architectures bysubjecting AutoPilot to environments with varying levels ofobstacle density. Increasing complexity affects the NN policy(deeper policy), as well as the hardware accelerator design.Fig. 8 shows the designs obtained using AutoPilot for twodifferent task complexities (low and high obstacle density).Each design point represents the SoC power, DNN acceleratorinference latency, and the success rate (color map).As described in Section V, AutoPilot uses Bayesian op-timization to tune the various parameters until convergencewhile optimizing the costs (performance, power, and successrate). While the NN policy determines the success rate, theaccelerator power (performance) depends on both the policyand HW parameters. AutoPilot converges to optimal acceleratordesigns by sampling less than 0.5% of the total design space.AutoPilot tunes the NN policies such that they have 2-6layers with each layer having either 32, 48, or 64 filters. For
A B C (a)
Low obstacle density.
A B C (b)
High obstacle density.
Fig. 8:
DSE for environments with varying complexity. the complex task, AutoPilot automatically selects deeper NNpolicies as its success rate is higher. For instance, 32 filters(and 3-5 layers) are sufficient to achieve a success rate higherthan 80% for low obstacle density, 48 filters are required forhigh obstacle density to get to a similar success rate.AutoPilot tunes the hardware accelerator parameters togenerate designs ranging from low-power to high-performance.We specifically tune array height/width between 16-128 andSRAM (Ifmap/Ofmap/Filter) sizes between 32KB-2MB. Fig. 8highlights three regions in the DSE to demonstrate howAutoPilot can generate hardware accelerator candidates undercertain power-performance bounds irrespective of the taskcomplexity. Region A, B, and C denote bounds that are under2 W (25 FPS), 4-2 W (50 FPS), 4-8 W (100 FPS) respectively.AutoPilot using Bayesian optimization converges to optimalaccelerator designs by sampling less than 0.5% of total designpoints from the entire design space. As task complexity changes,it can generate a multitude of design candidates within the samepower-performance bounds. As we co-design cyber-physicalparameters, having multiple design candidates translates togreater scalability of the methodology to select optimal computeplatforms as sensor or body-dynamics changes (Section VII-B).
B. Cyber-Physical Co-Design
While compute DSE generates a large spread of architecturaldesigns, not all points are ideally suited for deployment onan aerial robot to achieve a balanced system (as shown inSection IV using F-1). Hence, in this section, we show that (1) the F-1 model is essential in finding the accelerator architecturebased on a user specification (e.g., drone type, sensor framerate)that will lead to optimal robot velocity, and (2) architecturesoptimized for raw performance or low-power, do not necessarilyresult in the optimal knee-point (maximum velocity).For a comprehensive analysis, we perform CPS co-designwith three aerial robots, namely Asctec Pelican (mini-UAV),DJI Spark (micro-UAV), and a nano-UAV [89] which have athrust-to-weight ratio (includes battery/sensor) of 2.4, 1.9, and3.1 respectively, denoting a change in the body-dynamics. Wealso consider sensor framerates of 30 and 60 FPS.Fig. 9 shows CPS co-design for the navigation task in thehigh-density environment. We filter the design points fromFig. 8b based on high-scoring success rates, as shown in Fig. 9a.These designs represent various accelerator candidates designedfor the NN policy that achieves a success rate of at least 83.4%(4 layers and 32 filters). Success rate of greater than 80%is nominal [33], [90] in aerial robot navigation tasks. Out9 a) Filtered points power/perf. (b)
AscTec with 30/60 FPS sensor. (c)
DJI Spark with 30/60 FPS. (d)
Nano-drone [89] with 30/60 FPS.
Fig. 9: (a) Filtering design points by success rate. (b)-(d) Cyber-physical co-design using F-1 model for high obstacle density navigationwith either 30 FPS (top) and 60 FPS sensor (bottom).
C.T. is the throughput of an algorithm, shown only when it is greater than f s (actionthroughput in these cases is equal to f s ). K.P , C.B , B.D represents knee-point, compute-bound, and body-dynamics bound respectively. of the many accelerator design candidates, we highlight fourdesigns denoted as ‘1’ (lowest-power and slower runtime) ‘2’ (AutoPilot selected), ‘3’ (highest performance and highestpower), and ‘4’ (AutoPilot selected). The architectural detailsabout these design points such as systolic array size, IFM/filtermemory are annotated within Fig. 9a.Using these four selected points, we demonstrate the needfor the F-1 model in designing the onboard compute for aerialrobots. We also show that cyber-physical co-design is critical toachieving optimal computing platforms to maximize velocity,instead of isolated hardware design target objective such ashigh performance, low power, or energy efficiency.
F-1 model identifies optimal design points.
Plotting thefour architectural designs points on the F-1 roofline modelfor AscTec Pelican (Fig. 9b), DJI Spark (Fig. 9c), and nano-UAV [89] (Fig. 9d) with 30 FPS and 60 FPS sensor framerate,we observe that balanced, high-performance, and low-powerdesign points are all far from optimal knee-point for theirrespective aerial robot. Instead, design point ‘2’ selected byAutoPilot is optimal knee-point for AscTec Pelican with 60FPS sensor. In the case of 30 FPS sensor with AscTec Pelican,the design point ‘4’ is the optimal design in terms of computeand any further improvement in compute performance will notresult in any improvement in velocity since the performanceis bound by the sensor framerate (30FPS). For DJI Spark with30 FPS and 60 FPS sensor and nano-drone [89] with 30 FPSsensor, AutoPilot selects ‘4’ as the optimal compute design.However, for nano-drone with 60 FPS sensor, ‘4’ is not optimalknee-point and will result in compute-bound scenario.Using the F-1 model in the CPS co-design phase, weshow that ad-hoc selection of high-performance computedesigns such as ‘3’ can degrade the overall performance (e.g.,high-speed velocity) of the drone. For instance, the highestperformance accelerator design point ‘3’ (highest power) forthe DJI Spark and nano-uav [89] decreases the safe-velocityby 13.2% and 44% due to the added weight of the heat-sinkfor cooling compared to the baseline No-Acc case. Thus, theF-1 performance model allows us to pick the optimal designrather than selecting the typical low-power, high-performance,or balanced architecture designs which would often be the caseif the compute is designed in isolation without CPS co-design.
One-size compute does not fit all.
For the AscTec Pelican(Fig. 9b) with 30 FPS and 60 FPS sensor framerate, we observethat the optimal design point is ‘4’ and ‘2’ , respectively. Interestingly, ‘4’ (optimal for 30 FPS sensor framerate), ifchosen as the compute platform, becomes compute-boundfor AscTec Pelican with 60 FPS framerate which drops themaximum safe velocity by 15% compared to design point ‘2’ . Takeaway.
Choosing the computing platform (either general-purpose or custom-designed) in an ad-hoc fashion can deterio-rate the physical performance of the robot, which will then haveimplications on the mission energy (discussed in Section VII-D).Hence, while designing (or selecting) the computing platformfor the aerial robot, one must account for the cyber-physicalparameters of the robot for maximum performance.
C. Architectural Fine-Tuning
To show the effectiveness of architectural fine-tuning, weconsider the Asctec Pelican robot with 60 FPS sensor, butassume that the knee-point (i.e., the design ‘2’ in Fig. 9b)was not achieved. For this case, using the bag of architecturaloptimizations (frequency/node scaling), we are able to movethe sub-optimal body dynamics- and compute-bound designs(points ‘3’ and ‘4’ in Fig. 10a) to the knee-point.
Body-Dynamics Bound.
Design ‘3’ (high-power, high-performance, and body-dynamics bound) clocked at 1 GHzand in a 45nm process node has a compute throughput thatis 3 × higher than the knee-point for the AscTec Pelican. Byscaling down its frequency to 125 MHz, AutoPilot bringsthis sub-optimal point closer to the knee-point (denoted by ‘3 ∗ ’ ) as shown in Fig. 10b. Lowering the frequency from1 GHz to 125 MHz reduces power consumption from 7.5 Wto 1 W (Fig. 10a). The power reduction reduces the heat sinkrequirement, making this design lighter and near-optimal. Compute-Bound.
Design ‘4’ clocked at 1 GHz in the 45 nmnode, results in a compute-bound case for the Asctec Pelican.To get this design to the knee-point, AutoPilot increases thethroughput of this accelerator by ∼ × without significantlyincreasing its power consumption. We show that by performingprocess scaling to 22 nm and clocking at 4 GHz, AutoPilotbrings it closer to the knee-point (Fig. 10b). Takeaway.
When the CPS co-design step (Section VII-B)is unable to generate the optimal knee-point design, thearchitectural fine-tuning engine can be launched that usesvarious optimization techniques to deliver the final knee-pointdesign. The level of flexibility allowed by AutoPilot and thetrade-offs it can make are configurable by the end-user.10 a) DSE plot. (b)
F-1 plot.
Fig. 10:
Architectural fine-tuning for the Asctec Pelican.
D. Mission Time/Energy Implications of Optimal System
The end-goal of AutoPilot is choosing an onboard computesystem (design points) that minimizes the mission time andenergy. To this end, we evaluate three robots, AscTec Pelican(mini-UAV, DJI Spark (micro-UAV), and nano-UAV used inZhang et. al [89]. We show that the optimal design point (i.e.knee-point) generated by AutoPilot always outperforms theother non-optimal designs (other AutoPilot generated designsor ad-hoc selection of onboard compute (e.g., TX2).
Mission Time Comparisons.
To estimate mission time, weassume a package delivery mission application/scenario, wherea radius of 100 m separates the source and destination. Wepick two categories of points, namely the knee-point and others(compute-bound, body dynamics bound, ad-hoc selection ofcomputing platform) for the three aerial robots. For each ofthe design points, we estimate the maximum velocity the robotachieves if it uses these designs as the onboard compute.Fig. 11a shows mission times (lower is better) for the fivedifferent computing platforms across the AscTec Pelican (mini-UAV), DJI-Spark and a Nano-UAV. AutoPilot generated optimaldesign (knee-point) always achieve the lowest mission time.It is worth noting that the selection of the knee-point designbecomes more critical as we miniaturize the aerial robot (mini-UAV ⇒ micro-UAV ⇒ nano-UAV). For instance, in AscTecPelican (Mini-UAV), the AutoPilot generated knee-point andthe body-bound design point (also generated by AutoPilot), themission time improvement is only 5% whereas for micro-UAVsand nano-UAVs the difference is 20% and 80%, respectively.The reason for marginal improvement in the case of AscTecPelican is because it is a bigger sized drone and has a higherpayload carrying capability. Hence, the TDP difference of4 W between the knee-point and body-bound design pointand its additional heatsink weight is negligible to cause anysignificant degradation to its body-dynamics (a max ) and its safevelocity (Eq. 4). However, in the case of DJI-Spark and Nano-UAV [89], the payload carrying capacity is lower and extraheatsink weight (TDP of compute) can significantly lower theacceleration (a max ) and its safe velocity. Mission Energy Comparisons.
Fig. 11b shows the missionenergy for three drone platforms with five different computeplatforms. We show that the knee-point design always lowersthe mission energy compared to other selections. Missionenergy (E mission ) is related to the mission time as follows: E mission = t mission ∗ ( P rotors + P compute + P others ) (5) T X ( D o e s N o t F i t ) (a) Mission time. T X ( D o e s N o t F i t ) (b) Mission Energy.
Fig. 11:
Comparison of AutoPilot generated points with other designs.All the points except P-DroNet (PULP-DroNet) [29] runs the samepolicy. For P-DroNet, we use the numbers reported from their work.
Where, t mission is the time it takes to complete the missionand P rotors , P compute , and P others are the power consumed bythe rotors, onboard compute, and other components (sensor,flight controller etc) in the aerial robots. It is important tonote P
Rotor consume more than 95% of the total power [11],[29], but a higher P
Compute (higher TDP ⇒ higher heatsinkweight) can lower the acceleration (a max ), which can lower theV sa f e (higher t mission ). Thus, a knee-point design lowers themission energy by minimizing t mission (higher V sa f e ⇒ lowert mission ) and minimizing P Compute compared to other designpoints (compute/body-bound, or ad-hoc selection).The optimal design generated by AutoPilot for AscTecPelican, DJI Spark, and the nano-UAV, achieves 2 × , 1.54 × ,and 1.81 × lower mission energy compared to other designs. VIII. Related Work
Performance Models.
Analytical performance models, suchas the Multicore Amdhals’ law [91], Roofline model [36], andGables [17], and several others [92] are useful to guide thedesign of an optimal system for a given workload. Theseperformance models are applicable for traditional computeand are not explicitly targeted for robots that have cyber andphysical components. Our work proposes a roofline-like modelto help understand the role of computing in aerial robots. Inthe context of performance modelling for complex systems(i.e., beyond compute only system), cote [93] is a full-systemmodel for design and control of nano-satellites. The cotemodel takes into account orbital mechanics, physical boundson communication, computation, and data storage to design acost-effective, low-latency, and scalable nano-satellite system.The F-1 model has a similar objective, where it combines theinteractions between compute/sensor (cyber components), andbody-dynamics (physical components) to understand variousbottlenecks to build an optimal system.
Accelerators for Robots.
Recently, a low-power accelera-tor [29] was proposed for neural network-based control, butthe work is customized for nano-drones, running DroNet [12].Our work provides a general methodology to generate multipleNN policies and hardware accelerator designs from a high-level specification. Navion [94] is a specialized accelerator foraerial robots, in the sense-plan-act control algorithm paradigm,for improving visual-inertial-odometry. We focus on end-to-end based control algorithms, which is an emerging autonomyalgorithm paradigm. RoboX [95] generates an accelerator formotion predictive control from a high-level DSL. Though11he high-level goal is the same, our work differs from theirsin that they do not consider the effect of the cyber-physicalparameters on the computing platform. We instead contributethe F-1 model to quantify the optimality of our designs. Outsideof aerial robots, prior work [96], [97] has shown the benefitsof designing custom hardware accelerators for motion planningalgorithms for robotics arm to improve the robots’ performance.Though the robots are different, AutoPilot guided by the F-1model can build similar optimally designed motion planninghardware accelerators targeted for aerial robots.
IX. Conclusion
AutoPilot is a push-button solution that automates cyber-physical co-design to automatically generate an optimal controlalgorithm (NN policy) and its hardware accelerator from a high-level user specification. The concepts we have developed forAutoPilot, such as cyber-physical co-design, the F-1 model foridentifying the optimal design point, architectural fine-tuning,and selecting the optimal design points by showing how itaffects the overall mission can be adapted to other types ofautonomous robots such as self-driving cars.
References [1] A. Timothy, M. N. Paul, A. T. Aaron, B. Joan, and S. Jeff, “Dronetransportation of blood products,”
TRANSFUSION Journal arXiv preprintarXiv:2009.06034 , pp. 7832–7839, IEEE,2018.[9] G. Loianno, D. Scaramuzza, and V. Kumar, “Special issue on high-speedvision-based autonomous navigation of uavs,”
Journal of Field Robotics ,vol. 1, no. 1, pp. 1–3, 2018.[10] S. Li, M. M. Ozo, C. De Wagter, and G. C. de Croon, “Autonomous dronerace: A computationally efficient vision-based navigation and controlstrategy,” arXiv preprint arXiv:1809.05958 , 2018.[11] B. Boroujerdian, H. Genc, S. Krishnan, W. Cui, A. Faust, and V. J. Reddi,“Mavbench: Micro aerial vehicle benchmarking,” 2019.[12] A. Loquercio, A. I. Maqueda, C. R. Del-Blanco, and D. Scaramuzza,“Dronet: Learning to fly by driving,”
IEEE Robotics and AutomationLetters , vol. 3, no. 2, pp. 1088–1095, 2018.[13] F. Sadeghi and S. Levine, “Cad2rl: Real single-image flight without asingle real image,” arXiv preprint arXiv:1611.04201 , 2016.[14] S. Krishnan, B. Boroujerdian, W. Fu, A. Faust, and V. J. Reddi, “Airlearning: An AI research platform for algorithm-hardware benchmarkingof autonomous aerial robots,”
CoRR , vol. abs/1906.00421, 2019.[15] M. Havasi and J. M. Lobato, “Bayesian optimization.” https://github.com/cambridge-mlg/gem5-aladdin/tree/master/bo script, 2018. [16] E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik,
Quan-titative system performance: computer system analysis using queueingnetwork models . Prentice-Hall, Inc., 1984.[17] M. Hill and V. J. Reddi, “Gables: A roofline model for mobile socs,” in
Proceedings 2007 IEEE International Conference on Robotics andAutomation , pp. 361–366, IEEE, 2007.[21] W. Koch, R. Mancuso, and A. Bestavros, “Neuroflight: Next generationflight control firmware,” arXiv preprint arXiv:1901.06553 , pp. 1484–1491, IEEE,2016.[24] R. B. Rusu and S. Cousins, “3d is here: Point cloud library (pcl),” in , pp. 1–4,IEEE, 2011.[25] A. Elfes, “Using occupancy grids for mobile robot perception andnavigation,”
Computer , vol. 22, no. 6, pp. 46–57, 1989.[26] M. G. Dissanayake, P. Newman, S. Clark, H. F. Durrant-Whyte, andM. Csorba, “A solution to the simultaneous localization and map building(slam) problem,”
IEEE Transactions on robotics and automation , vol. 17,no. 3, pp. 229–241, 2001.[27] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimalmotion planning,”
The international journal of robotics research , vol. 30,no. 7, pp. 846–894, 2011.[28] D. Gonzalez, J. Perez, V. Milanes, and F. Nashashibi, “A review ofmotion planning techniques for automated vehicles,”
IEEE Transactionson Intelligent Transportation Systems , vol. 17, no. 4, pp. 1135–1145,2016.[29] D. Palossi, A. Loquercio, F. Conti, E. Flamand, D. Scaramuzza, andL. Benini, “A 64mw dnn-based visual navigation engine for autonomousnano-drones,”
IEEE Internet of Things Journal et al. , “End to end learningfor self-driving cars,” arXiv preprint arXiv:1604.07316 , 2016.[32] S. Ross, N. Melik-Barkhudarov, K. S. Shankar, A. Wendel, D. Dey, J. A.Bagnell, and M. Hebert, “Learning monocular reactive uav control incluttered natural environments,” in , pp. 1765–1772, IEEE, 2013.[33] N. Smolyanskiy, A. Kamenev, J. Smith, and S. Birchfield, “Toward low-flying autonomous mav trail navigation using deep neural networks forenvironmental awareness,” in , pp. 4241–4247, IEEE, 2017.[34] D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang,D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke, et al. , “Qt-opt: Scalable deep reinforcement learning for vision-based roboticmanipulation,” arXiv preprint arXiv:1806.10293 , 2018.[35] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V.-D.Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in , pp. 8248–8254, IEEE, 2019.[36] S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightfulvisual performance model for multicore architectures,”
Communicationsof the ACM , vol. 52, no. 4, pp. 65–76, 2009.[37] M. Watterson and V. Kumar, “Safe receding horizon control for aggressivemav flight with limited range sensing,” in , pp. 3235–3240,IEEE, 2015.[38] S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor,and V. Kumar, “Planning dynamically feasible trajectories for quadrotors sing safe flight corridors in 3-d complex environments,” IEEE Roboticsand Automation Letters , vol. 2, no. 3, pp. 1688–1695, 2017.[39] “Field of view.” https://en.wikipedia.org/wiki/Field of view, 2020.[40] R. Mahony, V. Kumar, and P. Corke, “Multirotor aerial vehicles: Modeling,estimation, and control of quadrotor,”
IEEE Robotics and Automationmagazine , vol. 19, no. 3, pp. 20–32, 2012.[41] J. Wei, J. M. Snider, J. Kim, J. M. Dolan, R. Rajkumar, and B. Litkouhi,“Towards a viable autonomous driving research platform,” in , pp. 763–770, IEEE, 2013.[42] F. Santoso, M. A. Garratt, and S. G. Anavatti, “Visual–inertial navigationsystems for aerial robotics: Sensor fusion and technology,”
IEEETransactions on Automation Science and Engineering , vol. 14, no. 1,pp. 260–275, 2016.[43] L. Smith and V. Aitken, “The auxiliary extended and auxiliary unscentedkalman particle filters,” in , pp. 1626–1630, IEEE, 2007.[44] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint kalmanfilter for vision-aided inertial navigation,” in
Proceedings 2007 IEEEInternational Conference on Robotics and Automation , pp. 3565–3572,IEEE, 2007.[45] H. Wang, H. Zhao, J. Zhang, D. Ma, J. Li, and J. Wei, “Survey onunmanned aerial vehicle networks: A cyber physical system perspective,”
IEEE Communications Surveys & Tutorials
Machine vision and applications , vol. 27, no. 7, pp. 1005–1020, 2016.[50] S. Liu, M. Watterson, S. Tang, and V. Kumar, “High speed navigationfor quadrotors with limited onboard sensing,” in
IEEE internationalconference on robotics and automation (ICRA) arXiv:1906.00421 , 2019.[60] https://github.com/harvard-edge/airlearning, title = Air Learning Envi-ronment Generator, year = 2020.[61] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel,“Domain randomization for transferring deep neural networks fromsimulation to the real world,” in , pp. 23–30, IEEE, 2017.[62] A. Samajdar, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna, “Scale-sim: Systolic cnn accelerator simulator,” arXiv:1811.02883 , 2018.[63] S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, “Cacti-p:Architecture-level modeling for sram-based structures with advancedleakage reduction techniques,” in
Proceedings of the InternationalConference on Computer-Aided Design ∼ /media/documents/products/power-calculator/ddr4 power calc.xlsm, 2016.[65] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian optimiza-tion of machine learning algorithms,” in NIPS , pp. 2960–2968, 2012. [66] B. Shahriari et al. , “Taking the human out of the loop: a review ofBayesian optimization,”
Proceedings of the IEEE , pp. 148–175, 2016.[67] B. Reagen et al. , “A case for efficient accelerator design space explorationvia Bayesian optimization,” in
ISLPED , 2017.[68] S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a formal modelof safe and scalable self-driving cars,” arXiv preprint arXiv:1708.06374 arXiv preprint arXiv:1510.00149 , 2015.[72] L. Pentecost, M. Donato, B. Reagen, U. Gupta, S. Ma, G.-Y. Wei, andD. Brooks, “Maxnvm: Maximizing dnn storage density and inferenceefficiency with sparse encoding and error mitigation,” in
Proceedings ofthe 52nd Annual IEEE/ACM International Symposium on Microarchitec-ture , MICRO ’52, (New York, NY, USA), pp. 769–781, Association forComputing Machinery, 2019.[73] S. Lupashin, M. Hehn, M. W. Mueller, A. P. Schoellig, M. Sherback, andR. D’Andrea, “A platform for aerial robotics research and demonstration:The flying machine arena,”
Mechatronics , vol. 24, no. 1, pp. 41–54,2014.[74] N. Michael, D. Mellinger, Q. Lindsey, and V. Kumar, “The grasp multiplemicro-uav testbed,”
IEEE Robotics & Automation Magazine , vol. 17,no. 3, pp. 56–65, 2010.[75] J. P. How, J. Teo, and B. Michini, “Adaptive flight control experimentsusing raven,”
Simulation , vol. 1, p. 1.[76] I. Palunko, P. Cruz, and R. Fierro, “Agile load transportation: Safeand efficient load manipulation with aerial robots,”
IEEE robotics &automation magazine , vol. 19, no. 3, pp. 69–79, 2012.[77] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier-stra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602 , 2013.[78] R. Polvara, M. Patacchiola, S. Sharma, J. Wan, A. Manning, R. Sutton,and A. Cangelosi, “Toward end-to-end control for uav autonomous land-ing via deep reinforcement learning,” in , pp. 115–123, IEEE, 2018.[79] C. Yan, X. Xiang, and C. Wang, “Towards real-time path planningthrough deep reinforcement learning for a uav in dynamic environments,”
Journal of Intelligent & Robotic Systems , Sep 2019.[80] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in
Advances in neuralinformation processing systems , pp. 1097–1105, 2012.[81] H. Li, M. Bhargav, P. N. Whatmough, and H. . Philip Wong, “On-chipmemory technology design space explorations for mobile deep neuralnetwork accelerators,” in
IEEE Transactions on ConsumerElectronics , vol. 56, pp. 1185–1190, Aug 2010.[85] “ARMv8-M Technical Reference manual.”[86] “Arm Cortex-M33.”[87] “Heat sink size calculator.” https://celsiainc.com/resources/calculators/heat-sink-size-calculator/. (Accessed on 01/29/2020).[88] “Coral-som-datasheet, howpublished = https://coral.ai/static/files/coral-som-datasheet.pdf, month = , year = , note = .”[89] X. Zhang, B. Xian, B. Zhao, and Y. Zhang, “Autonomous flight control ofa nano quadrotor helicopter in a gps-denied environment using on-boardvision,”
IEEE Transactions on Industrial Electronics , vol. 62, no. 10,pp. 6392–6403, 2015.[90] A. Giusti, J. Guzzi, D. C. Cires¸an, F.-L. He, J. P. Rodr´ıguez, F. Fontana,M. Faessler, C. Forster, J. Schmidhuber, G. Di Caro, et al. , “A machinelearning approach to visual perception of forest trails for mobile robots,”
IEEE Robotics and Automation Letters , vol. 1, no. 2, pp. 661–667, 2015.[91] M. D. Hill and M. R. Marty, “Amdahl’s law in the multicore era,”
Computer , vol. 41, no. 7, pp. 33–38, 2008.
92] E. J. Kim, K. H. Yum, and C. R. Das, “Introduction to analytical models,”
Performance Evaluation and Benchmarking , p. 193, 2018.[93] B. Denby and B. Lucia, “Orbital edge computing: Nanosatellite constel-lations as a new class of computer system,” in
Proceedings of the Twenty-Fifth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems , pp. 939–954, 2020.[94] A. Suleiman, Z. Zhang, L. Carlone, S. Karaman, and V. Sze, “Navion:A 2-mw fully integrated real-time visual-inertial odometry acceleratorfor autonomous navigation of nano drones,”
IEEE Journal of Solid-StateCircuits , vol. 54, no. 4, pp. 1106–1119, 2019.[95] J. Sacks, D. Mahajan, R. C. Lawson, and H. Esmaeilzadeh, “Robox: anend-to-end solution to accelerate autonomous control in robotics,” in
Proceedings of the 45th Annual International Symposium on ComputerArchitecture , pp. 479–490, IEEE Press, 2018.[96] S. Murray, W. Floyd-Jones, Y. Qi, G. Konidaris, and D. J. Sorin, “The mi-croarchitecture of a real-time robot motion planning accelerator,” in , pp. 1–12, IEEE, 2016.[97] S. Murray, W. Floyd-Jones, Y. Qi, D. J. Sorin, and G. D. Konidaris,“Robot motion planning on a chip.,” in
Robotics: Science and Systems ,2016.,2016.