Probabilistic Localization of Insect-Scale Drones on Floating-Gate Inverter Arrays
Priyesh Shukla, Ankith Muralidhar, Nick Iliev, Theja Tulabandhula, Sawyer B. Fuller, Amit Ranjan Trivedi
11 Probabilistic Localization of Insect-Scale Drones onFloating-Gate Inverter Arrays
Priyesh Shukla , Student Member, IEEE , Ankith Muralidhar , Nick Iliev , Theja Tulabandhula , Sawyer B.Fuller and Amit Ranjan Trivedi , Member, IEEE
Abstract —We propose a novel compute-in-memory (CIM)-based ultra-low-power framework for probabilistic localization ofinsect-scale drones. Localization is a critical subroutine for path-planning and rotor control in drones, where a drone is required tocontinuously estimate its pose (position and orientation) in flyingspace. The conventional probabilistic localization approaches relyon the three-dimensional (3D) Gaussian Mixture Model (GMM)-based representation of a 3D map. The GMM map model issynthesized by scanning the 3D space and by modeling the densityof matter. A GMM model with hundreds of mixture functions istypically needed to adequately learn and represent the intricaciesof the map. Meanwhile, localization using complex GMM mapmodels is computationally intensive. Since insect-scale dronesoperate under extremely limited area/power budget, continuouslocalization using GMM models entails much higher operatingenergy – thereby, limiting flying duration and/or size of the dronedue to a larger battery. Addressing the computational challengesof localization in an insect-scale drone using a CIM approach,we propose a novel framework of 3D map representation usinga harmonic mean of “Gaussian-like” mixture (HMGM) model.We show that short-circuit current of a multi-input floating-gateCMOS-based inverter follows the harmonic mean of a Gaussian-like function. Therefore, the likelihood function useful for dronelocalization can be efficiently implemented by connecting manymulti-input inverters in parallel, each programmed with theparameters of the 3D map model represented as HMGM. Whenthe depth measurements are projected to the input of theimplementation, the summed current of the inverters emulatesthe likelihood of the measurement. We have characterized ourapproach on an RGB-D indoor localization dataset. The averagelocalization error in our approach is ∼ ∼ ∼ µ Wpower while processing a depth frame in 1.33 ms over hundredpose hypotheses in the particle-filtering (PF) algorithm used tolocalize the drone. Comparatively, a custom-designed 8-bit digitalprocessor requires ∼ ∼ × less energythan the traditional, paving the way for tiny autonomous drones. Index Terms —Compute-in-memory (CIM), insect-scale drones,probabilistic localization.
I. I
NTRODUCTION
Spatially intelligent devices can autonomously traverse andexplore their application domain, thereby, opening intriguingprospects for sensors and embedded systems. For example,
The authors are with Department of Electrical and Computer Engineering,University of Illinois at Chicago, IL 60607 USA, Department of Informationand Decision Sciences, University of Illinois at Chicago, IL 60607 USA, and Department of Mechanical Engineering, University of Washington, Seattle,WA 98115 USA. Fig. 1: Insect-scale drone [12] weighing less than a gram and smaller than apenny was demonstrated by Fuller lab in the University of Washington. in an agriculture field, flying cameras can locate infectedplants to prevent disease spread [1], [2]; flying gas sensorscan identify gas leaks in an industrial plant [3], [4]; duringearthquakes, autonomous cameras can assist in search andrescue missions by navigating through building debris [5], [6];and inside a home, a flying camera can actively monitor firebreakouts and intruders [7], [8]. Thus, most embedded sys-tems, enlivened by spatial intelligence, can have dramaticallyelevated use-cases. However, the flying devices must also besmall enough to be non-intrusive to people in the flying spacesand be able to navigate through constricted spaces. Recently,impressive progress has been made in miniaturizing drones.So called insect-scale drones studied in [9]–[12] weigh lessthan a gram and are smaller than a penny. Meanwhile, sincean insect-scale drone can only carry the payload of a tinybattery, minimizing power dissipation of on-board processingfor spatial intelligence becomes extremely critical [13]. Whilethe cloud can be leveraged to circumvent the above on-boardprocessing challenges, in many applications, reliable contin-uous connectivity to the cloud cannot be assumed. Cloud-based control may also suffer from unpredictable latency.Moreover, continuous transmission of drone’s video stream tocloud itself is energy hungry [14]. Hence, on-board ultra-low-power processing capacity for self-navigation is indispensable.The most basic operation for self-navigation is to determinethe position and orientation (i.e., pose) of a drone duringits flight. Path planning objectives such as motion trackingand obstacle avoidance require one to continuously assess a r X i v : . [ c s . R O ] F e b the drone’s pose. Therefore, drone localization must also beevaluated in real-time. Additionally, since the flying spaceof a drone is dynamic, drone localization must be robustagainst dynamic variations in the flying space. For example,in an indoor application, there will be movement of peopleand changes in lighting conditions. Localization of a dronemust be robust against these variations. Bayesian probabilisticinference is a promising approach for enhancing the robust-ness of inference by extracting both the prediction and the confidence of prediction [15]–[17]. A probabilistic frameworkfor drone localization models the likelihood of measurementagainst a hypothesized drone’s pose model [18]. The pos-terior probability of drone’s pose can be compared againstcompeting hypotheses to identify the most likely one. Thelikelihood models can also estimate the prediction confidencebased on the variance of the prediction. A high confidenceprediction will have low variance in estimates, and vice versa .Nonetheless, current state-of-the-art probabilistic localizationmodels are quite computationally intensive. While predictiverobustness of a probabilistic localization is highly desirablein an insect-scale drone, incorporating the same is quitechallenging due to limited computational resources as wellas the need to predict in the real-time. Overcoming the abovechallenges, this work makes the following key contributions: • We propose a novel representation of 3D maps using aharmonic mean of “Gaussian-like” mixture (HMGM) model.We show that, unlike a Gaussian Mixture Model (GMM),HMGMs can be computed with much higher efficiency andenable a similar localization accuracy as GMM. We alsodiscuss an Expectation Maximization (EM)-based learningprocedure for HMGM’s extraction. • We use the key property that the short-circuit current ofa multi-input floating gate CMOS inverter circuit followsthe harmonic mean of Gaussian-like functions. Therefore,by emulating the key computing kernels in our localizationapproach directly at the transistor level, we are able tominimize computing workload and data movements in theprocessing, and thereby, energy demand for the computing.Using the proposed implementation, the localization model isevaluated with high parallelism, therefore, timing and energyconstraints for real-time predictions that were hard to achieveare now feasible. • We have characterized our localization approach usingan RGB-D dataset. The average localization error in ourhardware-based approach is ∼ ∼ ∼ µ W power when processing adepth frame in 1.33 ms over hundred pose hypotheses usingparticle-filtering (PF). Comparatively, a custom-designed 8-bit digital processor consumes ∼ × higher than our approach. We have also rigorously an-alyzed the impact of process variability and limited precisionin our design, and document methodologies to mitigate theirimpact. II. O VERVIEW OF P ROBABILISTIC L OCALIZATION
Localization of an autonomous mobile agent invites un-certainties in its pose estimates. A probabilistic localizationframework can rigorously account for such modeling andmeasurement uncertainties to not only predict the estimatedpose but also the confidence of prediction, which is expressedby the variance of the predicted estimate. Extracting predictionconfidence along with the prediction itself allows for a risk-aware navigation, wherein the drone can seek additionalmeasurements if the prediction confidence is low.Using Bayes’ rule, the posterior probability of pose ( x ) ofa drone is given by P ( x | z ) = P ( z | x ) P ( x ) P ( z ) (1)Here, P ( z | x ) is the likelihood of measurement z and P ( x ) is the prior probability of pose belief. A maximum likelihoodestimate (MLE) of drone’s pose maximizes the likelihood ofthe current measurement, i.e., x MLE = argmax P ( z | x ) (2)Meanwhile, the maximum a posteriori estimate (MAP) of posesearches for the estimate that maximizes posterior probabilityof the pose, i.e., x MAP = argmax P ( z | x ) P ( x ) (3)Under global localization, when the drone is completelyuncertain about its pose, MAP and MLE are equivalent. Other-wise, MAP estimate allows for a systematic way to account forprior belief. Using probabilistic localization, a drone can alsorecursively integrate sequential measurements in a systematicframework to reduce its estimation uncertainties. Bayesianfiltering [19] is a well-established localization method for thispurpose where the belief of drone’s pose can be estimatedusing a recursive flow of Bayes’ rule as bel ( x t ) = (cid:88) x t − P ( x t | u t , x t − ) bel ( x t − ) (4a) bel ( x t ) = ηP ( z t | x t ) bel ( x t ) (4b)Here, bel ( x t ) in (4a) is the drone’s new pose belief at iteration‘ t .’ bel ( x t − ) is the pose belief at previous iteration ‘ t − .’ u t is the control input to the drone. P ( x t | u t , x t − ) is theprobabilistic motion model exploiting Markov assumptionwhere belief of x t depends only on the previous state’s belief,and the most recent control signal u t . (4b) is the correctionon the new pose belief based on the sensor measurements.Particle Filtering [20], [21] allows a practical implementa-tion of Bayesian filtering under arbitrary likelihood and belieffunction profiles by using a Monte Carlo method. Instead ofrelying on analytically defined likelihood and belief functions,the method considers a set of hypotheses (aka particles) sam-pled based on the underlying density functions. Each particlehas an associated weight index which is updated based on themeasurements following the above equations. Unlikely parti-cles with low likelihood are sequentially filtered out with eachmeasurement. Various uncertainties, such as in drone’s motioncontrol, are accounted for by regenerating a new particle setfrom the current, where children particles represent stochasticdeviations corresponding to motion control uncertainties.Although Bayes’ rule-based probabilistic localization allowsa systematic framework to seamlessly integrate single-shot V X V Y V Z DACs Log-ADCI LL HMGM modellingBayesian (Particle)fltering-based drone's poseestimation Log-likelihoodbu ff ers CT transistorsprogrammed withHMGM parameters Log-likelihood evaluationsstored in bu ff ersDepth scan to 3D coordinatesprojection model Column current ~ harmonic mean o f Gaussian-like f unction Pose trajectory
Precision controlled DACs and ADC
N-bit precision
EM-based parameter extraction Compute-in-memory block f or log-likelihood evaluation
3D point-cloud
Fig. 2: Compute-in-memory based ultra-low-power Monte-Carlo localization framework for insect-scale drones. The frameworks inputs sequential depthprojections. CIM is programmed with harmonic mean of ”Gaussian-like” mixture (HMGM) model parameters. CIM computes likelihood of depth measurementscorresponding to hypothesized pose which is further used in particle filtering (PF) to predict pose of drone.
MLE/MAP estimates of drone poses as well as sequentiallocalization steps (Bayesian filtering) as discussed above, theprocedure can be quite computationally intensive. The mostdominant complexity is the estimation of likelihood term P ( z | x ) itself since the measurements (images/depth maps) arehigh dimensional and models correlating drone’s pose with themeasurements can be quite complex depending on the size andintricacies of the 3D map.A popular approach to extract P ( z | x ) is by modeling the3D map using a Gaussian mixture model (GMM) [22]–[24].Using 3D scanning devices, such as Microsoft Kinect [25],[26], a point-cloud data-based 3D map of the domain can becaptured. A GMM is synthesized to model the point-clouddata which essentially represents the density of “matter” inthe 3D map. To adequately model intricacies of the 3D map,a sufficient number of mixture functions in the GMM arenecessary. During flight, depth sensors in a drone capture adepth map of its current observation. Although, depth sensingis energy expensive, recent works such as [27], have shownenergy efficient, small form-factor depth sensors, suited forinsect-scale drones. Based on the current pose belief, the scan z of N non-zero depth map pixels { z , z , ..., z N } is projectedto 3D via the camera’s projection model: C c = (cid:34) o x + f C x C z o y + f C y C z (cid:35) (5a) C w = [ C c − T cw ] R − (5b)Here, C x , C y and C z are the point coordinates in 3Dspace. Their 2D perspective projection in the camera frameis (cid:16) f C x C z , f C y C z (cid:17) . C c is the depth pixel coordinate matrix incamera frame mounted on the insect-scale drone. f is the focallength, and o x and o y are optical offsets. T cw and R cw aretranslation and rotation matrices, respectively, transformingdepth scans in camera frame to the world frame based onthe pose belief. Thereby, P ( z | x ) is estimated by evaluating p GMM ( z proj ( x , z ); Θ ) , where z proj ( · ) is the projection func- tion of measurements z to 3D space based on pose belief x . Θ are the GMM parameters.In a typical evaluation of pose localization, the likelihood p GMM ( z proj ( x , z ); Θ ) is computed for hundreds of non-zerodepth pixels and over hundreds of mixture functions inthe GMM model. Therefore, the computational workload of P ( z | x ) is excessive. Moreover, during a flight, the drone needsto continuously localize itself; hence, the timing constraintson P ( z | x ) are also stringent. In the subsequent section, wediscuss a novel approach to considerably minimize the energydemand for measurement likelihood evaluation which even-tually leads to a viable low power probabilistic localizationapproach for insect-scale drones.III. U LTRA - LOW - POWER P ROBABILISTIC L OCALIZATION
Prior works [22]–[24] use a GMM to probabilisticallyrepresent a 3D space, i.e., model the density of matter inthe 3D map. However, evaluation of likelihood models basedon GMM is computationally intensive. A dimension of eachmixture function in GMM requires subtractions, multipli-cations, additions (to compute the exponent of a Gaussianfunction) as well as look-ups (to add exponents using a log-ADD table [28]–[30]). Since these operations are repeated onall dimensions, mixture functions as well as the hypotheses,likelihood evaluation of a typical depth map may require tensof thousands of arithmetic operations and hundreds of memorylook ups to evaluate the likelihood of a single measurement.
A. A Mixture of Harmonic Mean of Gaussian-like Functions
We discuss a novel representation of 3D maps which canconsiderably simplify the computation of measurement like-lihoods. Consider the six transistor inverter design controlledby three input voltages, V X , V Y , and V Z , in Figure 3(a). Thetransistors in the inverter comprise a floating gate to programthreshold voltage [31]–[33]. Therefore, by programming thecharge density in the floating gate, the threshold voltage of I INV V X V Y V Z Charge-trap (CT) transistorVDD (a) I I N V ( no r m a li z e d ) GATE
HMG-fit6T-Invcurrent I
INV (V X )I INV (V Y )I INV (V Z ) (b) Log ( I I N V ) in log domain0 - 8- 4 I I N V ( no r m a li z e d ) V Y V X (e) GATE I I N V ( no r m a li z e d ) ∆V TH -0.1 0 0.1 (c) X V Y HMG-fit 6T-INV (d)
Harmonic mean ofbivariate Gaussian-like(HMG) function
Fig. 3: (a) 6T-inverter circuit to emulate Harmonic mean of Gaussian-like(HMG) function. (b) I
INV behaves Gaussian-like upon varying one of theinput voltages - V X , V Y , or V Z . (c) Programming I INV -peak by controllingthreshold voltage ( ∆ V TH in Volts). (d-e) Contour and surf plots with bi-variateinput (V X , V Y ) control. transistors can be programmed in a non-volatile manner. InFigure 3(b), the inverter current, I INV , through series-connectedtransistors emulates a Gaussian-like function. The figure showsI
INV when varying one of the input voltage – V X , V Y , or V Z – while keeping the others fixed.Notably, the Gaussian-like nature of inverter’s short circuitcurrent can be understood by representative transistor’s currentequations. In Figure 3(b), close to the peak, inverter’s shortcircuit current is governed by equating NMOS and PMOScurrent in the saturation regime, i.e. I N,SAT ≈ β n V dr,n (1 + λ n V OUT ) (6a) I P,SAT ≈ β n V dr,p (1 + λ p ( V DD − V OUT )) (6b)Here, V dr,n = V IN − V TH,n is NMOS’s drive voltage and V dr,p = V DD − V IN − V TH,p is PMOS’s drive voltage. V
TH,n/p is thethreshold voltage of NMOS/PMOS, V IN is the input voltageto the inverter, λ is the channel length modulation parameter,and β n and β p are the transconductance coefficients of NMOSand PMOS, respectively. Since I N,SAT = I P,SAT , V
OUT and V IN are related as V OUT ≈ β p V dr,p (1 + λ p V DD ) − β n V dr,n λ n β n V dr,n + λ p β p V dr,p (7) Therefore, inverter’s short circuit current near the peak isdetermined by applying V OUT expression from (7) to (6a).Under the setting, β p = β n and negligible channel lengthmodulation, inverter’s current near the peak is given by I INV,near peak ≈ β n V dr,n V dr,p V dr,n + V dr,p (8)Therefore, the peak current voltage, V peak , is centered at V dr,n = V dr,p , i.e., ( V DD + V TH,n − V TH,p ) / . At V TH,n = V TH,p , V peak is at V DD / . V peak can be modulated by ∆ amount by programminginverter’s threshold voltages as V TH,n → V TH,n +∆ and V TH,p → V TH,p − ∆ . Under these settings, the peak current magnitudeis the same, and only V peak modulates. Also, note that closeto the peak, I INV,near peak has a quadratic dependence to V IN ,just like a Gaussian function has to the input variable near itspeak.Away from the peak, inverter’s current is governed byNMOS’s (PMOS’s) subthreshold current, i.e., I INV,tail ≈ I ,n exp (cid:16) V dr,n nV T (cid:17) (9)Here, V T is the thermal voltage and I ,n is the leakage currentat V dr,n = 0 . Therefore, at the tail, inverter’s current decaysexponentially. However, unlike in a Gaussian function, theexponent is linearly dependent on the argument variable ( V IN ).Therefore, we can model inverter’s short circuit current usinga “Gaussian-like” function defined as I INV ≈ I exp (cid:16) − ( V IN − µ ) σ ( α + | V IN − µ | ) (cid:17) (10)Here, µ is the input voltage where inverter’s short circuitcurrent peaks and depends on V TH,n and V TH,p . α is a fit-ting parameter that models the transition from quadratic toexponential dependence of I INV at increasing/decreasing V IN . σ models the variance of I INV and it depends on the thermalvoltage and transistor’s ideality factor, n . Figure 3(b) showsthe correlation between the inverter current model in (10)and HSPICE-simulated inverter’s short circuit current in linearand log-domains. Figure 3(c) shows the modulation of µ byprogramming V TH,n and V TH,p in the inverter design.Figures 3(d) and (e) show the contour and surface plotsrespectively of I
INV when varying two gate voltages at atime. Under a multivariate control of inverter’s current, i.e,when more than one input voltages vary simultaneously, thecharacteristics of I
INV are, in fact, more closely represented bya Harmonic mean of Gaussian-like (HMG) function in (10)where each constituent 1D function is controlled by inputV X , V Y , and V Z . The column current in the design followsa harmonic mean of Gaussian since the resistance of eachinverter is inversely proportional to a Gaussian-like function.The column current can be expressed as I INV ≈ I (cid:80) i = X,Y,Z exp (cid:16) ( V i − µ i ) σ ( α + | V i − µ i | ) (cid:17) (11)Here, µ X , µ Y , and µ Z are the respective mean of Gaussian-like functions which are controlled by the threshold voltage oftransistors connected with the input voltage V X , V Y , and V Z ,respectively.Notably, compared to a multivariate Gaussian function,the implementation and evaluation of HMG in Figure 3 ismuch simplified. While a digital implementation to evaluate multivariate Gaussian will require multiplier, adder, and look-up tables for exponential or log-ADD function [34], [35],HMG in Figure 3 is implemented using only six transistors andcan be evaluated by applying analog voltages V X , V Y , and V Z .Also, compared to [36], the model can be implemented usingtypical CMOS transistors and doesn’t require modifications intransistor design and processes. As we will discuss later, fordrone localization, V X , V Y , and V Z will correspond to depthpixel’s projection along 3D spatial dimensions. A model withmixtures of HMG functions can also be simply implementedby connecting many multi-input inverters in Figure 3(a) in par-allel, each corresponding to a mixture component, and sharingtheir gate-input terminals. The total current from the parallelinverters will thus emulate the likelihood of measurementsapplied at the gates.The likelihood of 3D projection of depth scan, d meas , forlocalization can, therefore, be computed using HMGM as (cid:96) ( d meas | Θ ) = N (cid:88) i =1 ln M (cid:88) j =1 λ j I INV ( V iX , V iY , V iZ ; µ j , σ j , α j ) (12)Here, [ V iX V iY V iZ ] is the 3D projection of i th depth pixelto the world frame. Projection of depth scan to world framewas discussed in (5). Θ represents the parameter set forHMGM, comprising of λ i (mixing proportion), µ i (mean), σ i (standard deviation) and α i (fitting parameter) for eachrespective mixture component i . B. Expectation-Maximization for HMGM
Parameters of HMGM ( Θ ) can be learned by adaptingExpectation-Maximization (EM) [37] procedure as shown inAlgorithm ?? . We begin with randomly initializing Θ . In theE-step, we compute the expected value of the log likelihoodof 3D map’s point cloud dataset d pc , i.e., (cid:96) ( d pc | Θ t ) . In the M-step, we compute optimal parameters Θ ( t +1) that maximizesthe expected log likelihood formulated in the E-step, i.e., (a)(b) (c) Fig. 4: (a) Harmonic mean of Gaussian-like mixture (HMGM) functionrepresentation of an indoor scene point-cloud. (b) Model scores of HMGM andGMM during Expectation-Maximization (EM) iterations to optimize modelparameters over the point-cloud. (c) Model scores improve with increase inthe number of mixture components. Θ ( t + ) ≡ arg max Θ (cid:96) ( d pc | Θ t ) . E-M steps are repeated untilthe log-likelihood of HMGM converges.Under full programmability, variance σ j of each mixturefunction can be learned independently. However, in our sim-plified implementation the variance of each mixture functionis dictated by the technology and transistor parameters (width,length, etc ). Conversely, the proposed scheme can easily im-plement a larger number of mixture functions with limited re-source/power overhead. Therefore, we exploit a larger numberof mixture components in our implementation to compensatefor the lack of flexibility due to a tied variance. In Algorithm ?? , only the mean and weights of mixture components arelearned, therefore, and a predetermined σ based on 6T-invertercharacteristics is used.Figure 4(a) illustrates the fitting of Scene-1 in RGB-Ddataset [38] using our HMGM-based model implementedthrough inverter array. We use “model score” to quantify thegoodness of fit. The model score ( MS ) is evaluated as theaverage of HMGM function’s value on point-cloud data, d pc ,in the negative log-domain, i.e., MS = − N N (cid:88) i =1 ln M (cid:88) j =1 λ j I INV ( d pc ; Θ ( t ) ) (13)A higher model score indicates a better representation of point-cloud data. Figure 4(b) shows that model score for HMGMincreases with EM iterations and then plateaus. Figure 4(b)also shows the model score for GMM which shows similarfitting profile. In Figure 4(c), the model score is plotted atincreasing number of mixture components for both HMGMand GMM models. The model score for both models improveswith more mixture components and then plateaus. Althoughthe model score of conventional GMM model is better thanour HMGM model, the former model is much more complex V Z V Y V X λ c columns λ c columns Mapping HMGM mixture function weights λ j withproportionate number of columnsDACs Log-ADC I LL (a) SH G1G2V
REF1 V REF2 V IN S1S2S3Gain '00''01''11'Polarity Stage 1 Stage 2 Stage m Flash2b 2b 2b 2bN-bits1.5b Log ADCTime alignment + Digital correction (b)
GNDI LL R Fig. 5: (a) Architecture of floating gate (FG) transistor based compute-in-memory (CIM) based on HMGM to evaluate pose likelihood. (b) LogarithmicADC converting I LL into digitized log-likelihood. to implement, meanwhile, each mixture in our HMGM modelonly requires six transistors. C. Compute-in-memory Likelihood Estimator
Figure 5(a) shows the architecture of a compute-in-memory(CIM) implementation based on the Harmonic mean ofGaussian-like mixtures (HMGM) for ultra-low-power like-lihood evaluations. The architecture comprises of parallelcolumns, each constituting a multi-input inverter design inFigure 3. The gate voltages, V X , V Y , and V Z , are shared amongcolumns. Each column injects a current proportional to HMGto I LL line at the top. The current of all column lines aresummed to produce the net current mimicking the likelihoodestimation which is then converted to digital domain using anADC. Since each column in Figure 5(a) comprises transistorsof identical dimensions, the effect of mixture function weightis accommodated by mapping mixture functions with higherweights on proportionally more number of columns as shownin the figure.The architecture of N-bit logarithmic ADC is shown inFigure 5(b) that converts an analog input voltage (V IN ) intodigital bits based on the following equation [39].log (cid:16) V IN V range × C (cid:17) = b N − N − + ... + b N C (14)Here, C is the code efficiency factor where larger values of C emphasize smaller signals, resulting into higher dynamicrange. The logarithmic pipelined-ADC architecture is based onsimple scalar multiplication, not requiring complex operationslike squaring and exponents. To convert analog current, I LL ,representing likelihood of depth scan for the pose hypothesis,into analog voltage, we use an Op-Amp with resistive feedback as shown in Figure 5(b). This also adds voltage-bias to theinverter columns.The architecture in Figure 5 considerably simplifies like-lihood evaluations. Since the current of mixture functions isadded over a wire, a dedicated adder is not needed. Althoughthe implementation requires data converters: analog-to-digitalconverter (ADC) and digital-to-analog converter (DAC), theoverheads of ADC/DAC amortize on a larger scale processingarchitecture comprising many parallel columns. Note that theworkload of ADC/DAC does not increase even when the HMGcolumns in the processing array are increased to operate onhigher complexity maps. D. Programming of Likelihood Estimator
Transistors in the likelihood estimator array in Figure 5(a)require threshold voltage (V TH ) programming to map therespective HMG function. As we discussed earlier, the voltagefor peak current of inverter can be programmed by ∆ alongX, Y, and Z input dimensions by programming V TH of thecorresponding NMOS and PMOS transistors as V TH,n → V TH,n + ∆ and V TH,p → V TH,p − ∆ . Notably, programmingof our likelihood estimator is similar to NAND flashes [40].However, our design requires several key modifications sinceeach column combines transistors of both n and p-types,whereas NAND flashes are based on n-type transistors alone.In Figure 6(a), to increase V TH of NMOS, we select therespective column by applying a high voltage, VDP, acrossit. Unselected columns are held at a voltage V inhibit . A pro-gramming pulse is applied at the gate of the selected NMOS.A passing gate potential, V pass , is applied to the gate ofunselected transistors. V inhibit and V pass are chosen so that thegate to channel potential is not high enough in the unselectedtransistors to induce a carrier tunneling. These design consid-erations for V inhibit and V pass in our case are similar to NANDflashes. Under the above programming configurations, electronin the selected NMOS inject to its floating gate through hotcarrier injection [41], thereby, increasing its V TH . To decreaseV TH , the source end of the selected transistor is left floatingand high programming pulse is applied at the drain. Due toa higher negative voltage drop between the floating gate anddrain, electrons are detrapped by tunneling to the high voltagedrain terminal.In Figure 6(b), PMOS is programmed by applying potentialbetween the gate and n-well body voltage. Each column hasseparate n-wells and body terminals. Source-drain terminals inthe selected column are left floating during programming. V TH of the selected PMOS is decreased by applying programmingpulses at the gate of the selected transistor and keeping thebody grounded. Due to a high voltage between gate andchannel, majority carrier electrons tunnel from n-well bodyto the floating gate. PMOS rows are unselected by applyinga passing gate potential V pass , similar to the case in Figure6(a). Inverter columns are unselected by applying a inhibitorypotential V inhibit at the body terminal so that the potentialbetween gate and channel is not enough to induce tunneling.V TH of PMOS is increased by applying programming pulses atthe body terminal, thereby, detrapping electrons from floatinggate to drain terminal. VDP V
INHIBIT V INHIBIT p,pass V p,pass V p,pass V n,pass V n,pass Selected Unselected
NMOS V TH increase INHIBIT V INHIBIT
FloatV p,pass V p,pass V p,pass V n,pass V n,pass Selected Unselected
NMOS V TH decrease Hot electron injection
10 V0 V 0 V7 V
Drain ejection
12 V
Floating gate
Controlgate p-Substrate BlockingOxideTunnelOxideD S D S (a)
Float Float
FloatFloatV p,pass V p,pass V n,pass V n,pass Selected Unselected
PMOS V TH decrease Float FloatFloatV p,pass V p,pass V n,pass V n,pass Selected Unselected
PMOS V TH increase Channel injection
10 V0 V
Float
Channel ejection
Float
Floating gate
Controlgate V n,pass Float
D S SDN-wellSubstrate
Float
TunnelOxide BlockingOxide V n,pass (b) SeparateN-well V INHIBIT V INHIBIT Float
Fig. 6: Threshold voltage (V TH ) programming in floating-gate inverter columns so as to program the mean of Harmonic mean of Gaussian-like mixture(HMGM) components. (a) Configuration to increase and decrease NMOS V TH . (b) Configuration to increase and decrease PMOS V TH . Notably, PMOS’s programmability in the above schemerequires separate n-wells for each column which affects thearea efficiency of inverter array. However, this is not a criticalconcern for most applications since only a few thousandminimum-sized transistors are needed even for complex mapsin our approach. Moreover, while our previous discussionconsiders floating gate CMOS for mean programmability, theschemes can also be extended to charge trap transistors (CTT).CTT based on HfO -metal gates were recently demonstrated[33], programmable under 2 V. Due to lower voltage operation,with CTT, the programming circuits of inverter array can besimplified to improve overall area efficiency of the array.IV. C HARACTERIZATION ON B ENCHMARK D ATASET
We use three distinct scenes from the RGB-D ScenesDataset v2 [38] to characterize our proposed localizationframework. Each scene in the dataset is a 3D reconstructedpoint-cloud formed by aligning a set of video frames. Thedataset also contains true camera poses for each frame. Thecamera pose constitutes the orientation expressed as quater-nions (q, w, p, r), and position as (x, y, z) coordinates.We estimate drone’s pose using the particle filtering (PF)approach under both global and relative localization. Underglobal localization, the drone is completely uncertain aboutits pose to begin with. By accumulating sequential depthmeasurements, the drone tracks its movements using PF. Underlocal localization, the drone is aware of the initial pose.However, since the motion model of drone is stochastic, itneeds to track its movements based on depth measurementsand a known initial pose as it navigates through the flyingspace. Under global localization, we initialize our PF setupwith hundreds of hypothesized poses (i.e., particles) uniformly distributed throughout the map. All particles (say K in num-ber) are initialized with the same weight ( /K ) quantifyingthat each hypothesis is equally likely. We then project thescanned depth frame to 3D world coordinates based on aparticle j using (5). Corresponding to each particle, a unique3D projection of depth scan, d jmeas , is determined. A modelscore MS ( d jmeas | Θ ) for each particle-measurement pair iscomputed using our inverter array-based likelihood estimatorin Figure 5. By accumulating model score of particles, theirweights are updated as w j = MS ( d jmeas | Θ ) (cid:80) Kj =1 1 MS ( d jmeas | Θ ) (15)To eliminate less likely particles with each measurement,we resample them from the current particle set based on theirweights while keeping the total number of particles same as K , similar to [42]. The weight-update followed by resamplingresults into an updated estimate of drone’s pose with uncer-tainty in the pose illustrated by the variance of resampledparticles. We then reassign equal weights to new particlesand re-iterate all steps from weight-update till resamplingfor each subsequent depth scan. With this recursive Bayesianestimation, the particles eventually converge resulting intopose prediction with higher confidence. Figure 7 shows suchparticle filtering in action.Figure 8 shows the trajectory tracking results on Scene-01based on the approach. Figure 8(a) shows the point cloud mapof Scene-01 along with sample RGB and depth scans from thedrone. Figure 8(b) shows the trajectory tracking under globaland relative localization for our approach and the conventionalGMM-based approach. Notably, the tracking accuracy of ourapproach is similar to the conventional approach. Figure 8(c) Initialization Depth scan
Fig. 7: Particle filtering (PF) on Scene-01 of RGB-D dataset using theproposed compute-in-memory (CIM) based HMGM model of the scene. shows the confidence in pose estimation corresponding toevery input depth frame. Under global localization, due tohigh initial uncertainty, variance bounds on pose particles arehigh. However, with each increasing measurement, the drone isable to minimize its uncertainty by particle filtering and usingthe proposed HMGM-based map representation. The boundsconverge with subsequent scans illustrating improvement inprediction confidence. The error in position estimates is shownin Figure 8(d) with an upper error limit of ∼ . m. We alsoestimate drone’s pose by our localization framework on twomore distinct scenes in the dataset, which are illustrated inFigure 9.V. I MPACT OF L OW P RECISION AND P ROCESS V ARIABILITY
We next analyze the implications of quantization on posetracking of drone. The lower the operating precision, thelower is the area/power requirement which is critical fora drone with limited resource budget. However, excessivelylow precision can significantly degrade the accuracy of poseprediction. Figure 10 shows the impact of quantization ofDAC, logarithmic-ADC and threshold (mean) programmingon the pose estimation. DAC precision quantizes the 3D depthprojection to inverter array, i.e. V X , V Y and V Z . Log-ADCprecision quantizes the digitized log likelihood evaluated bythe inverter array. Precision on transistor thresholds quantizethe mean of HMGM mixture components and depends onthe precision of floating gate memory in the inverter array.Compared to floating point precision, the root mean squarederror (RMSE) of predicted poses at lower operating precisionis higher as shown in the figure. Our simulation results indicate (a) (b) (c) (d) Uniformly distributedpose hypotheses across the mapPose hypothesesconverge with subsequentdepth scans
Fig. 8: Trajectory tracking results on (a) Scene-01 of RGB-D dataset. (b)Drone’s pose under both global and local (or relative) localization based onboth HMGM and GMM. (c) Lower and upper bounds of pose hypotheses withsequential depth scans. (d) Pose error under both global and local localization. (b)
Scene-05 (a)(c) (d)
Scene-09
Fig. 9: Pose tracking results on (a, c) Scene-05, and (b, d) Scene-09 of RGB-Ddataset under both global and local localization settings. a non-monotonic change in RMSE at increasing precision, al-though, a higher precision generally helps in reducing RMSE.
Floating pointMeanLeft:Middle:Right: Scene-01Scene-05Scene-09 R M SE ( c m ) o f p r e d i c t e d po se Bar segments
Floating pointADCLeft:Middle:Right: Scene-01Scene-05Scene-09 R M SE ( c m ) o f p r e d i c t e d po se Bar segments (a)(b)
Floating pointDACLeft:Middle:Right: Scene-01Scene-05Scene-09 R M SE ( c m ) o f p r e d i c t e d po se Bar segments (c)
16b FP 6b 8b 12b 16b
Fig. 10: Implications of low-precision (a) Mean, (b) ADC, and (c) DAC onpose prediction (root mean squared error (RMSE)) characterized over threedistinct scenes of RGB-D dataset.
Nonetheless, even with very low precision in mean encoding,DAC, and ADC (such as 2-bit, 8-bit, and 8-bit, respectively),the pose estimation accuracy is competitive to the floatingpoint precision.We have also analyzed the impact of process variationon pose prediction in our approach, shown in Figure 11.Threshold voltage (V TH ) variations in the inverter transistorsdegrade the emulation of 3D map model using inverter array,thereby, resulting in inaccuracy in likelihood evaluation andfinally in pose estimation. The impact of V TH variability isanalyzed using Monte Carlo simulations. V TH of transistors inthe inverter array are randomly perturbed from the ideal basedon process statistics of V TH variability. For hundred randomcases, RMSE of the predicted trajectories and ground truthis analyzed. Based on the simulation results in Figure 11, σ (V TH ) <
20 mV is tolerable in our design and it does notdegrade accuracy significantly from the baseline. Variations inV TH of transistors can be controlled by increasing their sizes orprocess knobs such as reducing channel doping concentration. Scene-09Scene-05Scene-01 R M SE ( c m ) o f p r e d i c t e d po se σ( V TH ) (mV)10 20 30 40 50 Fig. 11: Process variation analysis on pose prediction at three distinct scenesfrom RGB-D dataset illustrating RMSE vs variance in threshold voltagevariation. (a) l og ( I LL ) Time (ns)250 260 270 ~ 11 ns0V 0.525VV X =V Y =V Z Stable I LL Settling timeInputactivation (b)
Fig. 12: (a) Floating gate CMOS memory latency to compute current-domainlikelihood of a 3D depth-pixel projection. (b) Power breakdown of Floatinggate CIM peripherals processing a depth frame over 100 pose hypotheses.
VI. P
OWER -P ERFORMANCE C HARACTERIZATION
We compute the power consumption and latency of ourframework by processing 640 ×
480 pixels depth frame fromthe RGB-D scenes dataset described earlier. Figure 12(a)shows an example transient current profile from the inverterarray when applying depth pixel’s projection as a step input.The operation of inverter array takes about ∼
11 ns to settle toa steady state value. Our considered 8-bit DAC requires ∼ ∼
46 ns. Thus, the total time tooperate one depth pixel is ∼
72 ns. Considering 100 particles inPF and on average ∼ % non-zero depth pixels (note that onlynon-zero depth pixels are projected and considered for likeli-hood evaluation), our framework takes ∼ SRAMμ & 1/σ Linear TransformDepth REG Pose REG Log(exp(x)+exp(y))procedureLog-Add LUTPose belief (H)Depth pixel (D) μ k k D j H i H i D j Mixture exponent E ijk E s u m Fig. 13: Digital pipeline implementing Gaussian mixture model (GMM) forpose likelihood evaluation. estimation is ∼ µW (at 45 nm CMOS process technology).We do not consider power contribution from the front-enddigital processing to project depth pixels to the 3D space, sincethe workload for the likelihood estimations is the dominantcomponent, exceeding by the orders of magnitude than thefront-end processing. Log-ADC with several comparators andOp-Amps at multiple pipeline stages consumes 74 % of thetotal power. The three DACs at the gate inputs contribute to19 % and the 100 column inverter array consumes 7 % of thetotal power as shown in the figure. The power breakdownalso indicates that with even more complex map representationmodel (i.e., the ones requiring more than 100 inverter columnsto implement), the total power dissipation in our approachwill not increase significantly from the current. To implementa more complex HMGM model, only more inverter columnswill be needed and three DACs and one log-ADC will sufficeas in the present configuration.Figure 13 shows a comparative digital pipeline implement-ing a Gaussian mixture model (GMM) to evaluate likelihoodof depth projections. We compare the performance of ourframework with this digital pipeline implemented in 45 nmCMOS [43]. The pipeline consists of two multipliers, oneadder, and one subtractor. The data flow is multiplexes foreach tuple of depth pixel, particle (pose hypothesis), andGMM mixture component. Therefore, even to process onemeasurement, tens of thousands of iterations of the pipelineare needed. Due to such higher performance requirement, atthe matching fps, the digital pipeline consumes ∼ ∼ × greater than our proposed FG-CIMframework. In our current estimate for the digital pipeline, weneglect the energy consumption for memory look-ups whichwill degrade the efficiency of conventional design even morecompared to ours. VII. C ONCLUSION
Energy efficient, low-latency, on-board processing of self-navigation algorithms is imperative in insect-scale drones. Ourproposed floating-gate compute-in-memory (FG-CIM) frame-work performs confidence-aware localization of the drone withultra-low-power consumption. The framework uniquely repre-sents the domain map with 3D harmonic mean of “Gaussian-like” mixture (HMGM) density model, characteristic of theinverter columns that construct FG-CIM. The scalable designprocesses depth frames for large number of pose hypotheses atvery high frames per second (FPS) due to massive parallelism, paving the way for robust, ultra-low-power spatial intelligencein insect-sized drones. R
EFERENCES[1] A.-K. Mahlein, “Plant disease detection by imaging sensors–parallelsand specific demands for precision agriculture and plant phenotyping,”
Plant disease , vol. 100, no. 2, pp. 241–251, 2016.[2] N. Chebrolu, P. Lottes, T. L¨abe, and C. Stachniss, “Robot localizationbased on aerial images for precision agriculture tasks in crop fields,”in .IEEE, 2019, pp. 1787–1793.[3] A. Kroll, W. Baetz, and D. Peretzki, “On autonomous detection ofpressured air and gas leaks using passive ir-thermography for mobilerobot application,” in . IEEE, 2009, pp. 921–926.[4] C. Heyer, “Human-robot interaction and future industrial robotics ap-plications,” in . IEEE, 2010, pp. 4749–4754.[5] L. Apvrille, T. Tanzi, and J.-L. Dugelay, “Autonomous drones forassisting rescue services within the context of natural disasters,” in .IEEE, 2014, pp. 1–4.[6] E. T. Alotaibi, S. S. Alqefari, and A. Koubaa, “Lsar: Multi-uav col-laboration for search and rescue missions,”
IEEE Access , vol. 7, pp.55 817–55 832, 2019.[7] V. K. KV, B. B. Khot, E. Oh, and B. Swain, “Home, office security,surveillance system using micro mobile drones and ip cameras,” Nov. 142017, uS Patent 9,819,911.[8] L. M. Belmonte, R. Morales, A. S. Garc´ıa, E. Segura, P. Novais, andA. Fern´andez-Caballero, “Assisting dependent people at home throughautonomous unmanned aerial vehicles,” in
International Symposium onAmbient Intelligence . Springer, 2019, pp. 216–223.[9] K. Y. Ma, P. Chirarattananon, S. B. Fuller, and R. J. Wood, “Controlledflight of a biologically inspired, insect-scale robot,”
Science , vol. 340,no. 6132, pp. 603–607, 2013.[10] J. James, V. Iyer, Y. Chukewad, S. Gollakota, and S. B. Fuller, “Liftoffof a 190 mg laser-powered aerial vehicle: The lightest wireless robot tofly,” in . IEEE, 2018, pp. 1–8.[11] S. B. Fuller, “Four wings: An insect-sized aerial robot with steering abil-ity and payload capacity for autonomy,”
IEEE Robotics and AutomationLetters , vol. 4, no. 2, pp. 570–577, 2019.[12] Y. M. Chukewad, J. James, A. Singh, and S. Fuller, “Robofly: An insect-sized robot with simplified fabrication that is capable of flight, ground,and water surface locomotion,” arXiv preprint arXiv:2001.02320 , 2020.[13] A. Suleiman, Z. Zhang, L. Carlone, S. Karaman, and V. Sze, “Navion:A fully integrated energy-efficient visual-inertial odometry acceleratorfor autonomous navigation of nano drones,” in . IEEE, 2018, pp. 133–134.[14] F. Mohammed, A. Idries, N. Mohamed, J. Al-Jaroodi, and I. Jawhar,“Uavs for smart cities: Opportunities and challenges,” in . IEEE,2014, pp. 267–273.[15] R. M. Neal,
Bayesian learning for neural networks . Springer Science& Business Media, 2012, vol. 118.[16] Z. Ghahramani, “Probabilistic machine learning and artificial intelli-gence,”
Nature , vol. 521, no. 7553, pp. 452–459, 2015.[17] S. Thrun, “Probabilistic robotics,”
Communications of the ACM , vol. 45,no. 3, pp. 52–57, 2002.[18] A. Kendall and R. Cipolla, “Modelling uncertainty in deep learningfor camera relocalization,” in . IEEE, 2016, pp. 4762–4769.[19] V. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello, “Bayesianfiltering for location estimation,”
IEEE pervasive computing , vol. 2,no. 3, pp. 24–33, 2003.[20] F. Dellaert, D. Fox, W. Burgard, and S. Thrun, “Monte carlo localizationfor mobile robots,” in
Proceedings 1999 IEEE International Conferenceon Robotics and Automation (Cat. No. 99CH36288C) , vol. 2. IEEE,1999, pp. 1322–1328.[21] S. Thrun, “Particle filters in robotics,” in
Proceedings of the Eighteenthconference on Uncertainty in artificial intelligence . Morgan KaufmannPublishers Inc., 2002, pp. 511–518.[22] R. W. Wolcott and R. M. Eustice, “Fast lidar localization using multires-olution gaussian mixture maps,” in . IEEE, 2015, pp. 2814–2821. [23] A. Dhawale, K. Shaurya Shankar, and N. Michael, “Fast monte-carlolocalization on aerial vehicles using approximate continuous beliefrepresentations,” in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition , 2018, pp. 5851–5859.[24] H. Huang, H. Ye, Y. Sun, and M. Liu, “Gmmloc: Structure consistentvisual localization with gaussian mixture models,”
IEEE Robotics andAutomation Letters , vol. 5, no. 4, pp. 5043–5050, 2020.[25] Z. Zhang, “Microsoft kinect sensor and its effect,”
IEEE multimedia ,vol. 19, no. 2, pp. 4–10, 2012.[26] R. B. Rusu and S. Cousins, “3d is here: Point cloud library (pcl),” in . IEEE,2011, pp. 1–4.[27] D. Wofk, F. Ma, T.-J. Yang, S. Karaman, and V. Sze, “Fastdepth: Fastmonocular depth estimation on embedded systems,” in . IEEE, 2019,pp. 6101–6108.[28] N. Iliev, A. Gianelli, and A. R. Trivedi, “Low power speaker identifica-tion by integrated clustering and gaussian mixture model scoring,”
IEEEEmbedded Systems Letters , vol. 12, no. 1, pp. 9–12, 2019.[29] A. Gianelli, N. Iliev, S. Nasrin, M. Graziano, and A. R. Trivedi, “Lowpower speaker identification using look up-free gaussian mixture modelin cmos,” in . IEEE, 2019, pp. 1–3.[30] P. Shukla, A. Shylendra, T. Tulabandhula, and A. R. Trivedi, “Mc2ram:Markov chain monte carlo sampling in sram for fast bayesian infer-ence,” in , 2020, pp. 1–5.[31] F. Masuoka, M. Asano, H. Iwahashi, T. Komuro, and S. Tanaka, “Anew flash e 2 prom cell using triple polysilicon technology,” in . IEEE, 1984, pp. 464–467.[32] J. Y. Bak, M.-K. Ryu, S. H. K. Park, C. S. Hwang, and S. M. Yoon,“Nonvolatile charge-trap memory transistors with top-gate structureusing in–ga–zn-o active channel and zno charge-trap layer,”
IEEEElectron Device Letters , vol. 35, no. 3, pp. 357–359, 2014.[33] F. Khan, E. Cartier, J. C. S. Woo, and S. S. Iyer, “Charge trap transistor (ctt): An embedded fully logic-compatible multiple-time programmablenon-volatile memory element for high- k -metal-gate cmos technolo-gies,” IEEE Electron Device Letters , vol. 38, no. 1, pp. 44–47, 2017.[34] A. Shylendra, S. H. Alizad, P. Shukla, and A. R. Trivedi, “Non-parametric statistical density function synthesizer and monte carlo sam-pler in cmos,” in .IEEE, 2020, pp. 19–24.[35] A. Shylendra, P. Shukla, S. Mukhopadhyay, S. Bhunia, and A. R.Trivedi, “Low power unsupervised anomaly detection by nonparametricmodeling of sensor statistics,”
IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems , 2020.[36] A. R. Trivedi and A. Shylendra, “Ultralow power acoustic feature-scoring using gaussian iv transistors,” in
Proceedings of the 55th AnnualDesign Automation Conference , 2018, pp. 1–6.[37] T. K. Moon, “The expectation-maximization algorithm,”
IEEE Signalprocessing magazine , vol. 13, no. 6, pp. 47–60, 1996.[38] K. Lai, L. Bo, and D. Fox, “Unsupervised feature learning for 3dscene labeling,” in . IEEE, 2014, pp. 3050–3057.[39] J. Lee, J. Kang, S. Park, J. Seo, J. Anders, J. Guilherme, and M. P. Flynn,“A 2.5 mw 80 db dr 36 db sndr 22 ms/s logarithmic pipeline adc,”
IEEEJournal of Solid-State Circuits , vol. 44, no. 10, pp. 2755–2765, 2009.[40] S. Aritome,
NAND flash memory technologies . John Wiley & Sons,2015.[41] E. Takeda, Y. Nakagome, H. Kume, and S. Asai, “New hot-carrier injec-tion and device degradation in submicron mosfets,”
IEE Proceedings I(Solid-State and Electron Devices) , vol. 130, no. 3, pp. 144–150, 1983.[42] J. D. Hol, T. B. Schon, and F. Gustafsson, “On resampling algorithms forparticle filters,” in , 2006, pp. 79–82.[43] Q. Xie, X. Lin, Y. Wang, S. Chen, M. J. Dousti, and M. Pedram,“Performance comparisons between 7-nm finfet and conventional bulkcmos standard cell libraries,”