Analog vs. Digital Spatial Transforms: A Throughput, Power, and Area Comparison
Zephan M. Enciso, Seyed Hadi Mirfarshbafan, Oscar Castañeda, Clemens JS. Schaefer, Christoph Studer, Siddharth Joshi
AAnalog vs. Digital Spatial Transforms:A Throughput, Power, and Area Comparison
Zephan M. Enciso , Seyed Hadi Mirfarshbafan , Oscar Casta˜neda ,Clemens JS. Schaefer , Christoph Studer , and Siddharth Joshi Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA Department of Electrical and Computer Engineering, Cornell Tech, New York, NY, USA Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerlande-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Abstract —Spatial linear transforms that process multiple par-allel analog signals to simplify downstream signal processing findwidespread use in multi-antenna communication systems, ma-chine learning inference, data compression, audio and ultrasoundapplications, among many others. In the past, a wide range ofmixed-signal as well as digital spatial transform circuits havebeen proposed—it is, however, a longstanding question whetheranalog or digital transforms are superior in terms of throughput,power, and area. In this paper, we focus on Hadamard transformsand perform a systematic comparison of state-of-the-art analogand digital circuits implementing spatial transforms in the same65 nm CMOS technology. We analyze the trade-offs betweenthroughput, power, and area, and we identify regimes in whichmixed-signal or digital Hadamard transforms are preferable. Ourcomparison reveals that (i) there is no clear winner and (ii)analog-to-digital conversion is often dominating area and energyefficiency—and not the spatial transform.
I. I
NTRODUCTION AND C ONTRIBUTIONS
Sensing and processing multiple analog signal channelssimultaneously is commonly encountered in a variety offields including healthcare (ultrasound), multi-antenna com-munication, machine learning, imaging, and computer vision.Efficiently processing parallel streams of analog signals remainsa challenging task due to the increasingly stringent latencyand energy requirements imposed on the underlying hardware.Because spatial transforms, in contrast to spectral or time-interleaved transforms, have no temporal dependencies betweeninputs, they are highly amenable to parallel processing inarea and energy efficient analog and digital circuits. Thisproperty of spatial transforms naturally raises the questionof whether spatial transforms are more efficiently implementedusing analog circuitry or through digital designs.Previous work [1] indicates that analog spatial processingcan be efficiently implemented using capacitor arrays. Theseresults suggest that analog processing prior to digitizationcan relax the requirements of the analog-to-digital converters(ADCs), improving the system’s overall energy efficiency.
The work of SHM, OC, and CS was supported by ComSenTer, one ofsix centers in JUMP, a Semiconductor Research Corporation (SRC) programsponsored by DARPA. The work of OC and CS was also supported by Xilinx,Inc. and by the US NSF under grants ECCS-1408006, CCF-1535897, CCF-1652065, CNS-1717559, and ECCS-1824379. SJ was supported in part byNSF/Intel Partnership on Machine Learning for Wireless Networking Systemsunder grant CNS-2002921.
Digital transforms come in various flavors, including streamingand time-interleaved architectures; see, e.g., [2]. However,not much is known about the efficacy of massively-paralleltransforms that are suitable for spatial processing of high-dimensional signals. Most importantly, to the best of ourknowledge, no systematic comparison between analog anddigital spatial transforms exists, which leaves the question ofwhich of the two approaches is more beneficial in practice.This paper represents a first attempt to systematicallycompare state-of-the-art analog and digital circuit designs withrespect to area, throughput, and power for implementing spatialtransforms. We focus on analog and digital circuits for spatialHadamard transforms implemented in the same commercial,general-purpose 65 nm CMOS technology. We first detail theanalog and digital circuit designs, provide reference post-layoutimplementation results, and compare their input and outputsignal-to-noise ratio (SNR) behaviors. We then study the areaefficiency (area per throughput) and energy efficiency (powerper throughput) trade-offs by considering the area and powerof ADCs. Our comparison enables us to identify operationregimes for which analog or digital designs are preferable.II. B
ACKGROUND
A. Hadamard Transform Basics
In order to compare analog vs. digital spatial transforms, wefocus on the Hadamard transform (HT), which finds widespreaduse for data compression, compressive sensing, imaging,and locality sensitive hashing. The Hadamard transform isessentially a matrix-vector product of a Hadamard matrix H m by a vector x ∈ R M with M = 2 m , i.e., y = H m x . AHadamard matrix H m of dimension m × m can be constructedrecursively. By defining H = 1 , we can construct Hadamardmatrices for natural numbers m as H m = 1 √ (cid:20) + H m − + H m − + H m − − H m − (cid:21) . (1)To avoid an explicit matrix-vector product that involves M − M additions and subtractions, one typically resorts to thefast Hadamard transform (FHT). The FHT repeatedly applies m − Hadamard transforms of size m = 2 (so-called radix-2butterfly operations y = H x ) in m stages as illustrated by the a r X i v : . [ ee ss . SP ] S e p ig. 1. Illustration of the dataflow graph of an M = 8 fast Hadamard transform(FHT). The FHT consists of m = log ( M ) = 3 stages each performing M/ two-dimensional Hadamard transforms on permuted inputs. dataflow graph in Fig. 1. Note the scale factors / √ , whichensure that Euclidean norms are preserved, i.e., (cid:107) y (cid:107) = (cid:107) x (cid:107) ,can be compensated either in every stage or at the end of theFHT; for the explicit Hadamard transform, the scale factors aretypically included at the end of the matrix-vector product. Thedigital Hadamard transform implementation relies on the FHT,whereas the analog Hadamard transform effectively implementsan explicit matrix-vector product using only capacitors. B. Prior Analog/Mixed-Signal Spatial Transform
The analog circuit implementing HT closely follows theprinciples developed in previously fabricated mixed-signalspatial signal processing circuits [3]. This prototype implementsanalog matrix-vector multiplication using continuous-timemultiplying digital-to-analog converters (MDACs) to form thematrix coefficients, which are then multiplied with differentialanalog inputs. Using capacitors in this fashion results in highlylinear circuits that (i) weight the analog AC signals and (ii)linearly sum them onto a common node, resulting in dB ofsignal separation performance for real-time beamforming ofmultiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) signals [3]. Each capacitor inthe MDAC uses a shielded structure in which, driven bottomand top plates shield the internal node from parasitics. Byimplementing continuous-time weighting of the analog signal,one mitigates capacitor switching and thus minimizes both CV switching energy and kT /C noise. Consequently, capacitorsizing is primarily determined by matching requirements. Wewill describe a suitable analog HT design in Section III-A. C. Prior Digital Spatial Transform
The fast Fourier transform (FFT) is among the mostprominent digital spatial transforms and finds widespread usein communication systems, e.g., for beamspace processing [4].FFT hardware design is an extremely mature area and state-of-the-art FFT designs can be generated automatically withSPIRAL [2]. In contrast, only a handful of custom FHT designshave been reported in the open literature; see, e.g., [5]. Existingparallel FHTs support relatively small dimensions (e.g. up to M = 16 ) and are typically applied to two-dimensional imagesfor data compression. FHTs are extremely hardware-friendlyas they only involve additions and subtractions. Furthermore,the simplicity of in-place processing minimizes the storage TABLE IP
OST - LAYOUT RESULTS FOR
POINT ANALOG H ADAMARDTRANSFORMS WITH DIFFERENT UNIT CAPACITORS IN NM CMOSC unit C unit area Array area f dB @14.4 µ S Cap. mismatch[fF] [ µ m ] [mm ] g driver [GHz] σ u /C u [arb. unit]0.68 2.25 0.078 4.65 0.061.5 4.41 0.153 2.55 0.0242.0 5.76 0.200 2.03 0.0164.0 10.24 0.356 1.1 0.01 [ [ V i1+ V i1_ V i2 _ V o1+ V o1_ V o2+ V o2_ +1+1 -1+1+1 +1+1 -1 [ [ V i1 V i2 [ V o1 V o2 [ [ V o1 V o2 [ [ o1o2 [ Analog Hadamard Transform [[[[[[[[[[[[[[[[[[[[[[[ [[[[[[[[[[[ V i2+ Fig. 2. Illustration of an × Hadamard transform matrix with detailsprovided for a representative × sub-block. Differentially encoded inputs, V + ij and V − ij , are either added or subtracted onto an output differential pair, V + ok and V − ok , through capacitive coupling. The addition/subtraction occurswhen the Hadamard transform matrix entry is a +1 / − . of sequential HT engines. Nevertheless, not much is knownfor larger Hadamard transforms that are suitable for spatialprocessing. We will describe a digital FHT design suitable forspatial processing in Section III-B.III. I MPLEMENTATION D ETAILS
A. Mixed-Signal Implementation
Our analog HT implements a × HT matrix using adifferential capacitor structure as shown in Fig. 2. The inputsand the outputs of this block are continuous-time, differentialanalog signals. Since the HT is a fixed-transform, this leads toa compact cell which is then repeatedly tiled in layout, eachbottom plate is driven by one of the polarities of the differentialsignals. We place two complementary instances of the arrayto ensure that both polarities of the signal see a constantcapacitive load. The capacitor array was laid out in TSMC65 nm CMOS, with the capacitors occupying metal layers 4,5, and 6. The area, maximum frequency, and unit capacitorvalues entered in Table I are from post-extraction simulation,they were verified against 10b to 14b data converters [3],[6] that we previously taped out. To derive a realistic cut-offfrequency for the system, we set the output conductance of thearray drivers to 14 µ S. Table I summarizes the design acrossmultiple array sizes. When C unit = 4 fF the HT capacitor arraysize is comparable to the digital implementations in Table II.Aggressive scaling of the unit capacitors to sub-femto-faradresults in f dB > GHz and consequently a f nyq. > GHz.
B. Digital Architecture and Implementation
Our digital FHT implements a fully-unrolled decimation infrequency architecture using radix-2 butterflies, as illustrated in
ABLE IIP
OST - LAYOUT RESULTS FOR
POINT DIGITAL FAST H ADAMARDTRANSFORMS (FHT S ) WITH B TO B INPUT PRECISION IN NM CMOSInput res. Area Max. freq. Power Area eff. Energy eff.[bit] [mm ] [GHz] [mW] [mm /GT/s] [pJ/T]5 0.195 1.603 346.7 0.122 216.46 0.236 1.605 431.4 0.147 268.87 0.277 1.439 440.6 0.192 306.28 0.314 1.429 517.0 0.219 361.99 0.341 1.431 575.9 0.239 402.510 0.394 1.377 617.1 0.287 448.0(a) Analog transform (b) Digital transformFig. 3. Comparison methodology. For the analog transform, we first apply theHadamard transform using passive, capacitor circuits followed by convertingthe analog signal using 128 ADCs; for the digital transform, we first use 128ADCs followed by the digital fast Hadamard transform (FHT). Fig. 1. The 128-point FHT implementation consists of m = 7 stages, where each stage contains radix-2 butterflies thatperform addition and subtraction of the two inputs. Since theoutput bitwidth of an adder/subtractor is one bit more thanthat of its input, we allow the odd-numbered stages to increasethe bitwidth by one—the even-numbered stages apply a scalefactor of , thereby maintaining the bitwidth. Consequently,the outputs of the design have only 4b more resolution than theinputs, which reduces area and ensures proper normalizationof the FHT.In order to minimize the critical path of our FHT design,the outputs of each stage are pipelined.Table II shows post-layout results for 128-point FHTsranging from 5b to 10b input precision in TSMC 65 nm CMOS.We note that these are—to the best of our knowledge—thefirst implementation results of digital 128-point Hadamardtransforms reported in the open literature. The cell density isaround 80% for all digital designs. Since our architecture isfully unrolled and pipelined, the maximum sustained throughput(in transforms per second) equals the maximum clock frequency.The area and net power consumption scale roughly linearlywith the number of input bits and the precision has a marginaleffect on the maximum clock frequency.IV. C OMPARISON
A. Methodology
Fig. 3 illustrates the comparison methodology used in thispaper. In order to arrive at a fair comparison between bothapproaches, we include the area and power of analog-to-digitalconverters (ADCs) that would otherwise be present in a real-world system. Additionally we account for signal attenuationincurred during the analog transform ( . dB for the -pointHT), by correspondingly increasing the SNR requirement from (a) Analog transform (b) Digital transformFig. 4. Input vs. output SNR for analog and digital Hadamard transforms. (a)Shows the effect of quantization and capacitor mismatch for the analog HTimplemented using a capacitor array composed of . fF unit capacitors. Theshaded area represents the spread in achievable output SNR with a solid line,representing the lowest point for a 90% yield. At an input SNR of dB thespread in output SNR due to mismatch is highlighted by the dotted lines. (b)Shows the output precision of the digital FHT design. the downstream ADC. To this end, for the analog transform,we first use the analog Hadamard transform design detailed inSection III-A followed by a dedicated ADC for each of the analog outputs. For the digital transform, we first use aset of ADCs to convert the analog inputs followed by thedigital FHT design. For both transform designs, we pick ADCsfrom [7] that match the resolution with signal-to-quantization-noise ratio (SQR) of the analog or digital transform, as well asthe maximum achievable bandwidth by the individual designs.
B. Input SNR vs. Output SNR
As a first step, we study the accuracy and linearity of thetwo approaches. To characterize the input and output SNR, weconsider the input signal model x = s + n , where s is thesignal vector and n is the noise vector; both are i.i.d. zero-meanGaussian. The signal and noise variances are determined byinput SNR. We then measure the output SNR as SNR out = E (cid:2) (cid:107) y (cid:107) (cid:3) E [ (cid:107) y − ˆ y (cid:107) ] , (2)where y = Hs is the output of an ideal, noise-free Hadamardtransform and ˆ y is the quantized output of transforming x = s + n using either the analog HT or the digital FHT.For the analog design, we consider the effect of capacitormismatch on the HT. All analog HT results were extractedfrom 400 Monte–Carlo trials of capacitor mismatch with 400trials per SNR. Using the methodology described in [8] andour fabricated IC [3], we estimate the mismatch coefficient forthe capacitors to be A = 2% √ fF.Fig. 4(a) shows the effect of this mismatch for C unit = 0 . fFon the SNR of a transformed signal, for various output ADCresolutions. At a target input SNR of 20 dB, the mismatchcreates a spread of possible values; the dotted lines in Fig. 4(a)indicate the maximum and minimum output SNRs observedover 400 Monte–Carlo trials for an input SNR of 20 dB. Forthe digital transform, we use a bit-true golden model to extractthe output SNR via Monte–Carlo simulations. Fig. 4(b) showsthe SNR transfer behavior of the digital FHT. We observe thatthe output SNR is lower than that of the analog transform for a) Energy efficiency (b) Area efficiency excluding ADCs (c) Area efficiency including ADCsFig. 5. Energy and area efficiency vs. output SNR trade-offs. (a) Although the analog design with . fF unit capacitors achieves higher f dB , operating atsuch frequencies requires expensive ADCs, which annihilate the benefit of compact analog circuitry. The analog design with fF unit unit capacitors achieveslower f dB , which is conducive to power efficient ADCs. For the digital FHT, the ADC power is comparable to that of the digital part. (b) Shows the areaefficiency without the ADC area, which reveals that analog transforms can be more compact and suffer from no area increase due to the fixed array size. (c)Shows the area efficiency with the ADC area, which shows that the ADC area is substantial, effectively resulting in designs of comparable efficiency. less than 7b input resolution—for higher resolution, the digitalFHT achieves higher output SNR. C. Area-efficiency and Energy-efficiency Trade-offs
Fig. 5(a) compares the energy efficiency obtained from twoanalog configurations (with unit capacitors fF and . fF)and the digital implementations. While the analog HT designwith the smaller unit capacitor operates at a higher bandwidth,the energy and area overheads of high-frequency ADCs aredetrimental to the combined system efficiency. Indeed, the fFarray shows superior energy efficiency than the . fF array,primarily due to a more energy-efficient ADC. As expected,at higher resolutions (output SNR ≥ dB), the digital designis more energy-efficient. Examining the energy contributionof the ADCs shows that the ADC power is comparable tothe power of the digital FHT power, but it dominates thepower of the analog HT. This disparity is explained by theADC SNDR increasing by dB to compensate for capacitorinduced attentuation in the analog signal path (insertion loss).Fig. 5(b) compares the area efficiency of the three designs,where we exclude the ADC area. In this comparison, theanalog circuits are much more area efficient, with the smallerarray ( C unit = 0 . fF) delivering an order of magnitude higherthroughput than the digital FHT. However, when ADC area isincluded in the comparison, Fig. 5(c) reveals that this advantageis immediately negated. Indeed, the area efficiency for all threedesigns now becomes comparable, in part due to the costlyADCs required for high-speed operation. Moreover, we cannotidentify a clear design point that is better across categories,i.e., while the slower operation due to larger capacitors leadsto improved energy efficiency, the larger area also reducesthroughput. As expected, the digital FHT is consistently betterthan analog HTs at very high resolution—when ADC overheadsare completely accounted for.V. C ONCLUSIONS AND O UTLOOK
We studied the area and energy efficiency of implementingspatial Hadamard transforms through passive analog circuitsand massively-parallel digital circuits. All of our designs have been implemented in the same 65 nm CMOS technology. Ouranalysis reveals that neither design is an outright winner inall categories. We note that the Hadamard transform uniquelyadvantages the analog design, leading to extremely compactand energy-efficient implementations. Despite this, our analysisreveals that the ADCs heavily influence the overall area andenergy efficiency of spatial Hadamard transforms, indicatingthat further optimizations must include data converter design.For analog spatial transforms to truly deliver, we would need:(i) the ADC to be co-designed with the analog processingand (ii) circuit topologies that exploit transform sparsitymust be employed to minimize insertion loss. Finally, anextensive comparison between analog and digital spatial Fouriertransforms, which are useful for emerging millimeter-wavecommunications systems, is part of future work.R
EFERENCES[1] S. Joshi, C. Kim, S. Ha, and G. Cauwenberghs, “From algorithms todevices: Enabling machine learning through ultra-low-power VLSI mixed-signal array processing,” in
IEEE Custom Integrated Circuits Conference(CICC) , Apr. 2017, pp. 1–9.[2] M. P¨uschel, J. M. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W.Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko et al. , “SPIRAL:code generation for DSP transforms,”
Proceedings of the IEEE , vol. 93,no. 2, pp. 232–275, Jun. 2005.[3] S. Joshi, C. Kim, S. Ha, Y. M. Chi, and G. Cauwenberghs, “21.7 2pJ/MAC14b 8 × IEEE International Solid-StateCircuits Conference (ISSCC) , Feb. 2017, pp. 364–365.[4] S. H. Mirfarshbafan and C. Studer, “Sparse Beamspace Equalization forMassive MU-MIMO mmWave Systems,” in
IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP) , May 2020, pp.1773–1777.[5] Y.-W. Huang, B.-Y. Hsieh, T.-C. Chen, and L.-G. Chen, “Analysis, fastalgorithm, and VLSI architecture design for h. 264/AVC intra frame coder,”
IEEE Transactions on Circuits and systems for Video Technology , vol. 15,no. 3, pp. 378–401, Feb. 2005.[6] C. Kim, S. Joshi, C. M. Thomas, S. Ha, L. E. Larson, and G. Cauwenberghs,“A 1.3mW 48MHz 4 channel MIMO baseband receiver with 65dB harmonicrejection and 48.5dB spatial signal separation,”
IEEE Journal of Solid-StateCircuits , vol. 51, no. 4, pp. 832–844, 2016.[7] B. Murmann et al. , “ADC performance survey 1997-2020,” , 2020.[8] H. Omran, H. Alahmadi, and K. N. Salama, “Matching Properties ofFemtofarad and Sub-Femtofarad MOM Capacitors,”