A bi-directional Address-Event transceiver block for low-latency inter-chip communication in neuromorphic systems
AA bi-directional Address-Event transceiver block forlow-latency inter-chip communication inneuromorphic systems
Ning Qiao and Giacomo Indiveri
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, SwitzerlandEmail: [qiaoning | giacomo]@ini.uzh.ch Abstract —Neuromorphic systems typically use the Address-Event Representation (AER) to transmit signals among nodes,cores, and chips. Communication of Address-Events (AEs) be-tween neuromorphic cores/chips typically requires two paralleldigital signal buses for Input/Output (I/O) operations. Thisrequirement can become very expensive for large-scale systemsin terms of both dedicated I/O pins and power consumption.In this paper we present a compact fully asynchronous event-driven transmitter/receiver block that is both power efficient andI/O efficient. This block implements high-throughput low-latencybi-directional communication through a parallel AER bus. Weshow that by placing the proposed AE transceiver block in twoseparate chips and linking them by a single AER bus, we candrive the communication and switch the transmission direction ofthe shared bus on a single event basis, from either side with low-latency. We present experimental results that validate the circuitsproposed and demonstrate reliable bi-directional event transmis-sion with high-throughput. The proposed AE block, integratedin a neuromorphic chip fabricated using a 28 nm FDSOI process,occupies a silicon die area of 140 µ m × µ m. The experimentalmeasurements show that the event-driven AE block combinedwith standard digital I/Os has a direction switch latency of 5 nsand can achieve a worst-case bi-directional event transmissionthroughput of 28.6 M · Events/second while consuming 11 pJ perevent (26-bit) delivery.
I. IntroductionThe Address-Event Representation (AER) has been widelyused in brain-inspired neuromorphic systems as a communi-cation protocol for transmitting and receiving spikes encodedas Address-Events (AEs) among spiking silicon neurons andsynapses. For example dynamic vision sensors [1] and siliconcochleas [2] use the AER to transmit their sensory processingoutputs to AERs neuromorphic processors and transceivers [ ? ],[ ? ], [3], [4], [6]. As these types of neuromorphic VLSI systemstypically require AEs to be transmitted with high throughputand low latency, the strategy employed to implement thecommunication protocol makes use of asynchronous bit-parallelAER channels. This strategy however is not scalable, as thewidth of the parallel bus and the power required to transmitthese parallel events scales with the size of the network. Inaddition, the pin count and power requirements become evenlarger if one desires to build modular systems with north/south,east/west Input/Output (I/O) links necessary to tile multiplecores or chips in 2D arrays [4], [6]. Instead of simple pureparallel AER protocol, some approaches use a “word-serial”protocol to transmit multiple row addresses for every columnaddress serviced (or vice-versa) to reduce pin numbers [ ? ],[7]. Furthermore, bit-serial Low Voltage Differential Signaling AER_inin_req in_ack TX_in_dataTX_in_reqTX_in_ackAER_outout_reqout_ack RX_out_dataRX_out_reqRX_out_ack TX_out_dataTX_out_reqTX_out_ackRX_in_dataRX_in_reqRX_in_ackSW_reqLSW_ackL
TX_Bu ff erTX_FIFORX_FIFO RX_Bu ff er TX_ENRX_EN TX_out_dataRX_in_dataTX_out_ackRX_in_ackRX_in_reqTX_out_reqSW_reqLSW_ackL RX_ENTX_ENTX_EN bus_ackbus_reqdata<0:n-1>
SW_Control
SW_reqRSW_ackR
AE Transceiver
Fig. 1. Architecture of proposed bi-directional AE transceiver block.SW_Control block checks states of two linked chip and generates controlsignal
T X/RX _ EN to allow TX_Buffer to push events on signal AER busor allow RX_Buffer to take events from the single AER bus. Bi-directional tri-state buffers are switched by T X/RX _ EN for bus direction. TX/RX_FIFOsare added to increase throughput of proposed AE transmission block. (LVDS) AER has been proposed as a potential solution totransmit events in a fully bit-serial format to further reducepin numbers [8]. However, these approaches lead to significantincrement in latency and overhead for the complexity of thecircuit implementation. Moreover, the design proposed in [8]needs additional clock generation and synchronization circuitswhich is expensive for fully asynchronous neuromorphic system.In this paper, we present a compact fully-asynchronousevent-driven AE transceiver block which can be easily combinedwith standard digital I/Os to realize bi-directional inter-chipAE communication through a single parallel AER bus withhigh-throughput and low-latency. In the next Section, weintroduce architecture of the proposed AE transceiver block.In Section III, we describe the circuits that implement theproposed bi-directional AER block. In Section IV we presentexperimental results obtained from the measurements of a testchip fabricated in 28 nm FDSOI process. We present concludingremarks and discussion in Section V.II. ArchitectureFigure 1 shows the architecture of the proposed AEtransceiver block. Bi-directional chip communication can beimplemented by connecting two AE transceiver blocks directlywith a single shared bit-parallel AER bus. As is shown inFig. 1, SW _ ack and SW _ req from two linked AE blocks areswapped and connected to announce state of each other. The a r X i v : . [ c s . A R ] A ug W_reqL (SW_ackR) SW_ackL (SW_reqR) Left Mode Right Mode0 1 TX RX ⇑ ⇓ TX → RX RX → TX1 ⇑ RX TX ⇓ → TX TX → RX TABLE I. SW_req/ack states for mode switching
SW_ackLSW_ackRTX_ENLout_reqLout_ackLTX_ENRout_reqRout_ackRbus_reqbus_ack
L R L RR L
L R Lbus_data L Lout_dataL Rout_dataR {{{ s h a r e d b u s c h i p L c h i p R Fig. 2. Mode switch scheme of proposed AER in/out block for bi-directionalevents transmission.
SW_Control block in each AE transceiver block then checks thestates of two connected chip which are indicated by SW _ ack and SW _ req , and generates control signals T X/RX _ EN toallow TX_Buffer to push events on the shared AER bus orallow RX_Buffer to take events from the bus. Each AE blockuse SW _ ack to identify its own states (i.e., logic “1” if this AEblock need to switch to transmitter mode “TX” for transmittingevents, and logic “0” if currently this AE block has no eventto transmit and can be switched to receiver mode “RX”), anduse SW _ req to get states of it’s linked AE block. SW_Controlblock on both sides will generate control signals T X _ EN and RX _ EN to switch on/off TX_Buffer and RX_Buffer andalternatively map terminals of TX_Buffer or RX_Buffer to theshared bus for a mode switching. Table I shows how modesare switched in different cases, with ⇑ representing logic “0”to “1” and ⇓ representing logic “1” to “0”.Moreover, conditions need to be met for a safe modeswitching. An AE transceiver block will only request a modeswitching RX → TX by asserting its SW_ack ⇑ when: 1) Theblock is currently in a “RX” mode; 2)The block has received atleast one event in “RX” mode (except that this block is initiallyreset to “RX” mode for a chip-level global reset); And 3) oneor more events need to be transmitted. An AE transceiver blockwill only acknowledge a mode switching request from its linedAE block by de-asserting its SW_ack ⇓ when: 1) The block iscurrently in a “TX” mode, and 2) it received a mode switchingrequest.Figure 2 shows an example how bi-directional transmissionis implemented with proposed AE transceiver block following4-phase handshaking. Assume that two AE transceiver blocksare linked by a signal AER bus, and initially we set SW _ ackL of left block to logic “1” and SW _ ackR of right block to logic“0”. So that initially the left AE block is in “TX” mode and the right AE block is in “RX” mode, to allow event transmissionfrom left to right. Once there is an event need to be transmittedon right side, SW _ ackR will be assert to “1” to request amode switching. After requested by SW _ ackR , as soon asthere is no more event need to be transmitted by the left AEblock, SW _ ackL will be deassert to “0” to acknowledge themode switching request. Correspondingly, T X/RX _ EN inboth blocks will be flipped to complete the mode switching.Bi-directional tri-state buffers as shown in Fig. 1 arethen switched by T X/RX _ EN for mapping signals ofTX/RX_Buffer blocks to shared AER bus. We should notice thatthe tri-state buffers can be directly replaced with standard digitalI/O with T X/RX _ EN as a configure signal for Input/Outputswitching. Input and output FIFOs are added to increasethroughput of proposed AE transceiver block.III. Circuits ImplementationThe proposed AE transceiver block is implemented fol-lowing 4-phase handshaking protocol based on Pre-ChargeHalf-Buffer (PCHB). Figure 3 shows circuit implementation ofSW_Control block for controlling mode switching. RX_Probeis employed to probe whether the belonging AE block hasreceived at least one event as a receiver ( P X _ P = “1”) in “RX”mode ( SW _ req = “1”). TX_Probe is used to probe whethercurrently the belonging AE transceiver block has no event tobe transmitted ( T X _ P = “0”) as a transmitter in “TX” mode( T X _ EN = “1”) when its linked AE transceiver block requeststo switch the mode ( SW _ req ⇑ ). Switch Controller sub-blockrequests a mode switching RX → TX (by asserting SW _ ack to“1”) when a coming event needs to be transmitted ( T X _ in _ req = “1”) if its belonging AE transceiver block is currently in“RX” mode ( RX _ EN = “1”) and it has successfully received atleast one event ( P X _ P = “1”) in “RX” mode. Three NFETs inSwitch Controller sub-block gated by T X _ in _ req , RX _ EN and P X _ P implement these guards. Switch Controller blockalso acknowledges a mode switching request from its linkedAE transceiver block for a mode switching TX → RX ( SW _ req = “1”) if currently no event needs to be transmitted ( T X _ P = “0”). Two p-FETs in Switch Controller sub-block gated by SW _ reqB and T X _ P implement these guards.As described in previous section, If the AE transceiverblock requested a mode switching RX → TX and its linkedAE block has acknowledged this request ( SW _ ack = “1” ∩ SW _ req = “0”) , this AE block will be switched to “TX” mode( T X _ EN = “1”). Otherwise, if this AE transmission blockhas acknowledged a mode switching TX → RX requested byits linked AE block ( SW _ req = “1” ∩ SW _ ack = “0”), thisAE block will be switched to “RX” mode ( RX _ EN = “1”). Inthese figures, signal ends with “B” represent reversed signal.Logic gates gated by SRst , P Rst are global reset signals usedto reset TX and RX Probes to an initial state, for example, RX _ P is reset to “0” for “TX” mode or “1” for “RX” mode.Figure 4 shows transistor level circuit implementation ofTX_Buffer based on PCHB, following 4-phase bundled-datahandshaking protocol. The process stage includes Handshakingand Data function blocks. Block 1 (cid:13) guarantees that the processstage only deal with coming events while the linked AEtransceiver block is free ( SW _ req = “0”). Block 2 (cid:13) checkswhether the processing progress is completed to generate X_in_req RX_PSW_reqSRst
T_R
PRstB
T_R
SRstB
T_R
PRst
T_R
RX_P SW_ackTX_in_reqRX_ENSRstB
T_R
SRst
T_R
TX_PSW_reqB PRstB
T_R
PRstB
T_R
TX_in_reqSW_reqB
TX_in_ack
TX_P C RX_ENSW_reqBSW_ack TX_EN
RX_ProbeTX_Probe TX/RX Switch Controller
Fig. 3. Circuit implementation of SW_Control block. RX_Probe and TX_Probeare used to probe its state once it is in “RX” mode or “TX” mode. TX/RXSwitch Control generates SW_ack signal and further generate TX/RX_EN forits mode switch. handshaking signal for previous process stage. Block 3 (cid:13) gener-ates internal enable signal en to enable functional processing.Matched delay element 4 (cid:13) is added to provide the worst caselatency of buffer operation from valid input event data to outputevent. Block 5 (cid:13) implements an event buffer function.RX_Buffer following 4-phase rail-rail handshaking protocolbased on PCHB is shown in Fig. 5. Block 1 (cid:13) checks whether theprocessing progress is completed and generates acknowledgesignal RX _ in _ ack to acknowledge previous process stage fora valid input and completed valid output. Block 2 (cid:13) generatesinternal enable signal en to enable functional processing. Dual-rail protocol (block 3 (cid:13) and 4 (cid:13) ) is utilized in this RX_Buffer andfollowing RX_FIFO stage for Quasi-Delay Insensitive (QDI)processes. Validity check block 5 (cid:13) is employed to indicateoutput data from this process stage is valid.IV. Experimental ResultsThe proposed AE transceiver block is implemented andplaced at all chip boarders of a neuromorphic chip in 28 nm FD-SOI process [9] for implementing 2D chip-array bi-directional26-bit AER communication. Standard digital I/Os with drivenability of 2 mA are adopted and internally configured by T X _ EN and RX _ EN for switching event transmissiondirection. As is shown in Fig. 6, each AE block occupiesa silicon area of 140 µ m × µ m. By easily utilizing proposedAE blocks, we saved 100 I/Os which is a significant reducingfor a prototype chip with totally 180 I/Os.In order to judge the performance, we first measured singledirection events transmission performance by continuouslysending address events from single direction. As is shown inFig. 7, AE transceivers from two linked chips are first reset totransmission direction from right to left. For continuously eventscommuning from left, AE blocks first need to switch trans-mission direction with a switching latency t sw of around 5 ns.Latency from a successfully mode switching to asserting the firstrequest t sw req is around 5 ns. For continuously single directionevents transmission, latency between two requests t req req isaround 31 ns, with a throughput of 32.3 M · Events/second.
TX_in_reqSW_reqB TX_in_v T X _ i n _ v PRstTX_EN
TX_out_ack
SRst e n TX_in_ackTX_EN
TX_out_ack en PRst e n TX_out_data<0:n>SRstTX_in_data<0:n>TX_in_vTX_in_v TX_out_req
Handshaking Data
TX_in_ack
Out_vB Fig. 4. Circuit implementation of 4-phase bundled-data TX_Buffer based onPCHB. The TX_Buffer includes handshaking and data blocks.
PRstSRst e n enPRst e n RX_out_data_f<0:n-1>
SRstout_vB RX_in_v
RX_ENRX_in_req RX_in_vout_v RX_in_ack out_vRX_in_ack
RX_out_data_f<0:n-1>
PRst e n RX_out_data_t<0:n-1>
SRstout.vBRX_in_v
RX_out_data_t<0:n-1> out.v
RX_out_data_t<0>RX_out_data_f<0>RX_out_data_t<1>RX_out_data_f<1>RX_out_data_t
HandshakingData Validity Check
Fig. 5. Circuit implementation of 4-phase dual-rail RX_Buffer based onPCHB. The RX_Buffer includes handshaking, data and validity check blocks.
For bi-directional transmission, we transmitted events fromboth directions of two linked AE blocks. As is shown in Fig. 8,request latency of two events from two directions is around35 ns, with an achieved worst case bi-directional throughput of28.6 M · Events/second. Energy for delivering one 26-bit eventis 11 pJ at 1 V power supply, excluding power consumptionfrom digital IOs.A summary of the key figures of the proposed AEtransceiver block is shown in Table II. i-AER
Fig. 6. Neuromorphic chip implemented in 28 nm FDSOI with proposedAE transceiver block combined with standard digital IOs for bi-directionalinter-chip AER communication. Each AE block occupies a silicon area of140 µ m × µ m. AE R _a ck ( V ) Time (s) ackR reqRreqLt req2req =31ns AE R _ r eq ( V ) S W _ r eq R ( V ) S W _ r eqL ( V ) t req2ack =5nst sw =5ns t sw2req =5ns Fig. 7. Signal waves for continuously one-direction events transmission witha throughput of 32.3 M · Events/second.
Time (s) ackLackL reqRreqRreqLreqRt req2req =35ns AE R _ r eq ( V ) reqR S W _ r eq R ( V ) S W _ r eqL ( V ) AE R _a ck ( V ) ackR Fig. 8. Signal waves for bi-directional events transmission with a throughputof 28.6 M · Events/second. TABLE II. AE bi-directional transmission block circuit key figures.
Process Technology 28 nm FDSOISilicon Area 140 µ m × µ mThroughput (with IO) 32.3 MEvents/s / 28.6 MEvents/s (bi-directional)Latency 5 nsEnergy per Event (26-bit) 11 pJ@1 V V. ConclusionsWe presented a compact low-power event-driven bi-directional AE transceiver block for high-throughput and low-latency bi-directional inter-chip communication. The proposedfully asynchronous AE block is compatible with standarddigital I/Os for easily implementing bi-directional inter-chipcommunication while saving half I/Os, comparing with normalbit-parallel AER protocol. Furthermore, it is possible to com-bine proposed scheme with "sub-words" to further reduce I/Onumbers and power consumption. We designed and fabricatedthe proposed AE transmission block in 28 nm FDSOI processwith an area of 140 µ m × µ m. Combined with standarddigital I/Os, we implemented 2D spiking neural network bi-directional chip-array communication. Chip measurements showthat the proposed AE transceiver block can achieve a worstcase bi-directional event throughput of 28.6 M · Events/s withenergy per event 11 pJ at 1 V supply voltage. The latency forswitching transmission direction between two AE transmissionblocks is around 5 ns.AcknowledgmentThis work is supported by the EU ERC grant “NeuroP”(257219) and by the EU ICT grant “NeuRAM ” (687299).References [1] T. Delbruck, B. Linares-Barranco, E. Culurciello, and C. Posch, “Activity-driven, event-based vision sensors,” in International Symposium onCircuits and Systems, (ISCAS), 2010 . Paris, France: IEEE, 2010, pp.2426–2429.[2] S.-C. Liu, A. van Schaik, B. Minch, T. Delbruck et al. , “Asynchronousbinaural spatial audition sensor with 2x64 4 channel output,”
BiomedicalCircuits and Systems, IEEE Transactions on , vol. 8, no. 4, pp. 453–464,2014.[3] N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini,D. Sumislawska, and G. Indiveri, “A re-configurable on-line learningspiking neuromorphic processor comprising 256 neurons and 128ksynapses,”
Frontiers in Neuroscience arXiv preprint arXiv:1708.04198 ,2017.[5] G. Indiveri, F. Corradi, and N. Qiao, “Neuromorphic architectures forspiking deep neural networks,” in
Electron Devices Meeting (IEDM),2015 IEEE International . IEEE, Dec. 2015, pp. 4.2.1–4.2.14. [Online].Available: http://ncs.ethz.ch/pubs/pdf/Indiveri_etal15.pdf[6] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada,F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo,I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner,W. P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuronintegrated circuit with a scalable communication network and interface,”
Science × µ s latency global shutter spatiotemporal vision sensor,” IEEEJournal of Solid-State Circuits , vol. 49, no. 10, pp. 2333–2341, 2014.8] C. Zamarreño-Ramos, R. Kulkarni, J. Silva-Martínez, T. Serrano-Gotarredona, and B. Linares-Barranco, “A 1.5 ns off/on switching-timevoltage-mode lvds driver/receiver pair for asynchronous aer bit-serialchip grid links with up to 40 times event-rate dependent power savings,”
IEEE transactions on biomedical circuits and systems , vol. 7, no. 5, pp.722–731, 2013.[9] N. Qiao and G. Indiveri, “Scaling mixed-signal neuromorphic processorsto 28 nm fd-soi technologies,” in