A clock-less ultra-low power bit-serial LVDS link for Address-Event multi-chip systems
AA clock-less ultra-low power bit-serial LVDS linkfor Address-Event multi-chip systems
Ning Qiao
Institute of NeuroinformaticsUniversity of Zurich and ETH ZurichZurich, SwitzerlandEmail: [email protected]
Giacomo Indiveri
Institute of NeuroinformaticsUniversity of Zurich and ETH ZurichZurich, SwitzerlandEmail: [email protected]
Abstract —We present a power efficient clock-less fully asyn-chronous bit-serial Low Voltage Differential Signaling (LVDS)link with event-driven instant wake-up and self-sleep features,optimized for high speed inter-chip communication of asyn-chronous address-events between neuromorphic chips. The pro-posed LVDS link makes use of the Level-Encoded Dual-Rail(LEDR) representation and a token-ring architecture to encodeand transmit data, avoiding the use of conventional large Clock-Data Recovery (CDR) modules with power-hungry DLL or PLLcircuits. We implemented the LVDS circuits in a device fabricatedwith a standard 0.18 µ m CMOS process. The total silicon areaused for such block is of 0.14 mm . We present experimentalmeasurement results to demonstrate that, with a bit rate of1.5 Gbps and an event width of 32-bit, the proposed LVDS linkcan achieve transmission event rates of 35.7 M Events/second withcurrent consumption of 19.3 mA and 3.57 mA for receiver andtransmitter blocks, respectively. Given the clock-less and instanton/off design choices made, the power consumption of the wholelink depends linearly on the data transmission rate. We showthat the current consumption can go down to sub- µ A for lowevent rates (e.g., <
1k Events/second), with a floor of 80 nA fortransmitter and 42 nA for receiver, determined mainly by staticoff-leakage currents.
I. IntroductionThe Address-Event Representation (AER) protocol has beenwidely used in neuromorphic computing systems to connectmultiple cores and chips together [1]–[6], in single-chip devicesfor encoding sensory signals [7] or for implementing spike-based learning mechanisms [8], [9], and in multi-chip sensory-processing systems [10]–[12]. By exploiting the asynchronousprinciple, the AER protocol is extremely efficient for event-driven neural system in terms of power consumption andlow latency. Bit-parallel AER is the most commonly usedimplementation, due to its ease of design and configuration.This strategy however is not scalable, as the width of the parallelbus and the power required to transmit these parallel event bitsscales with the size of the network. This can become a criticalissue for large scale neuromorphic systems, which typicallyemploy multiple copies of AER buses for routing eventsto multiple destinations and receiving events from multiplesources [1], [3], [4]: these systems are normally arranged andtiled in 2D arrays with North-South, East-West, and possiblydiagonal Input/Output (I/O) links between them. This requiresa very large pin-count and can lead to significant leakage anddynamic power consumption. Rather than using the full parallel AER protocol, some approaches have resorted to employing a“word-serial” protocol, which groups multiple row addresses fora given column address to reduce pin count [5], [13]. However,it has been argued that one of the most efficient solutionsfor transmitting AER data in terms of both speed and powerconsumption, is to use a bit-serial Low Voltage DifferentialSignaling (LVDS) scheme [14].Event rates in neuromorphic systems tend to be sparse, but tohave high peak values [13]. As the time information is typicallyimportant, low latency is an essential requirement. TraditionalLVDS schemes are designed for continuous data transmissionwith power consumption that depends on clock frequency, andindependent of the input data rate. In some LVDS schemesit is possible to send idle comma characters and signal apause in the data transmission. However, these idle statesmay cause loss of synchronization between transmitter andreceiver, and many (e.g., in the order of hundreds) clock cyclesare typically required for these lock recoveries. Therefore, astraditional LVDS implementations are likely to cause significantlatency for event transmission, they are not suitable for AERneuromorphic systems. Previous approaches have proposedto optimize the Clock-Data Recovery (CDR) scheme so thatthe phase lock of transmitter and receiver can be recoveredon the fly [14], but they required additional clock generationand synchronization circuits, such as DLL or PLL circuits,for the CDR which are very expensive in terms of powerand area requirements. The event-based nature of AER datatransmission in neuromorphic systems calls for the developmentof a new fully asynchronous clock-less event-based switchablebit-serial AER LVDS link, that does not need clock recoverycircuits. In this paper we propose a new clock-less LVDSscheme optimized for neuromorphic systems, and demonstrateits implementation in a prototype chip, fabricated using astandard 0.18 µ m CMOS process. We show that the chipdesigned successfully implements the following features:1) Pure asynchronous design without PLL/DLL for CDR.2) Instant on ( < < a r X i v : . [ c s . ET ] A ug ig. 1: Encoding example of LEDR. Shaded regions representthe even phase and the non-shaded regions represent the oddphase.multi-chip neuromorphic systems.The paper is organized as follows: Section II presents the datatransmission scheme and link architecture; Section III describesthe circuits implementation of the proposed bit-serial LVDSlink; Section IV presents the measurements made with theprototype chip and describes the experimental results; Section Vshortly concludes the work.II. Encoding Scheme and Architecture A. Data encoding
It is possible to implement a clock-less fully asynchronousevent-driven bit-serial LVDS link by choosing a proper dataencoding scheme that eliminates the need of traditional CDR,which is expensive for asynchronous systems. One schemethat is optimally suited for AER data is the LEDR signalingscheme [15]. In LEDR signaling data bits are encoded usingtwo rails: given a sequence of bits, a data rail is used torepresent the bit value and a parity rail is used to representthe parity relative to the encoding phase and data rail. Theencoding alternates between an even and an odd phase. In theeven phase the parity rail takes the inverted bit value, and inthe odd phase the parity rail takes the same bit value of thedata rail. Formally the data rail value D [ i ] and the parity railvalue P [ i ] are: (cid:40) D [ i ] = B [ i ] ; P [ i ] = B [ i ] for odd phase D [ i ] = B [ i ] ; P [ i ] = B [ i ] for even phasewhere B[i] represents the encoded bit value of the sequence.Figure 1 shows an encoding example for an 8-bit data sequence.The LEDR is a Delay Insensitive (DI) protocol: sequentialbits can easily be distinguished by checking whether D[i]=P[i]or not. So, by encoding address event data strings usingLEDR, it is possible to build fully asynchronous bit-serialLVDS links without using a clock generation block or a clocksynchronization block for CDR.According to this LEDR encoding scheme, it is possible toimplement both asynchronous encoder and decoder. On theencoder side, the data rail should always take the originalsequence bit value while the parity rail should take the invertedsequence bit value for the odd phase, and the original sequencebit value for the even phase. On the decoder side, it is sufficientto check if D [ i ] = P [ i ] or P [ i ] = D [ i ] to determine the bitphase, and then to read incoming bits one by one. This schemeleads to a very compact design in terms of hardware resources.Because LEDR encoding follows a two-phase handshaking (or Fig. 2: A typical 8-bit transceiver based on token-ring archi-tecture. Token-cells are labeled as “TCell”.Non-Return-to-Zero (NRZ)) protocol, it allows a full bit rateand provides a significant bandwidth advantage comparing toalternative schemes based on Phase Encoding (PE) or Dual-RailReturn-to-Zero (DR-RZ) methods. B. LVDS with token-rings
Token-ring schemes have already been proposed for asyn-chronous sequential data transmission [16]. A token-ringcomprises a number of mutually exclusive token-cells totransmit their data content one by one. Figure 2 shows a typical8-bit transceiver based on token-ring architecture [16]. Token-cells in the “Transmitter block” are activated sequentially totake one bit at a time from a parallel data bus and to write iton a shared interconnection link. Token-cells in the “Receiverblock” take bit values sequentially from the shared bus toreconstruct the parallel data.A token-ring based serializer can be built following theLEDR scheme to sequentially encode both data and corre-sponding parity bits from a parallel bus to a shared serialone. Accordingly, a token-ring based de-serializer can be builtto de-serialize and decode the data by taking data bit-by-bit from shared data/parity wires. The block diagram of theasynchronous bit-serial LVDS link we propose based on theseconcepts is shown in Fig. 3. It comprises the following blocks:“Input Buffer”, “TX Token-Ring”, “RX Token-Ring”, “LVDSDrivers”, “LVDS Receivers”, “Output Buffer” and “ControlQueue”.The “TX Token-Ring” block is implemented to serialize andencode event parallel bits into data and parity rails followingthe LEDR scheme. The “RX Token-Ring” is implemented tode-serialize and reconstruct parallel bits from data and parityrails. The “LVDS Drivers” convert data and parity rails intolow-voltage differential signals for low-power consumption andhigh-speed inter-chip data transmission. Similarly, the “LVDSig. 3: Architecture of the proposed bit-serial LVDS link.Receivers” convert LVDS signals back to normal digital signals.In order to minimize power consumption and make it dependonly on the event-rate, we propose a novel instant on/off schemefor LVDS Drivers and Receivers, described in the followingsection. Finally, the data transmission is done in a “burst mode”,such that the acknowledge signal is returned once per addressevent word, rather than bit-by-bit. Address event input andoutput buffers are included to pipeline the transmission cycleand increase data depth on both sides. A small “Control Queue”block with the same depth of the output buffer is employedto pre-store multiple acknowledges, so that the transmitter cankeep on sending events without waiting for their correspondingacknowledge signals to arrive, in order to minimize latency.
C. Instant On/Off driver and receiver
Instant on/off LVDS drivers and receivers that implementevent-driven wake-up and sleep-mode mechanisms are crucialfor minimizing consumption in neuromorphic systems thatoperate with sparse activity and low average event rates. Sincethe main digital blocks communicate with each other followinga four-phase handshaking protocol, no dynamic power isdissipated in idle states. The “LVDS Drivers” can be easilyturned on or off by a digital signal, such as
T X . r of Fig. 3, asnew event data appears on the “Input Buffer” block. In orderto turn on/off the “LVDS Drivers” instantly, we exploited thecommon voltage of LVDS pairs. As shown in Fig. 4, duringthe idle state, when no data is being transmitted, the two pairsof LVDS signals are both pulled down to Gnd , resulting in a0V common mode voltage. In this way the “LVDS Receivers”, designed with NMOS input transistors, will be fully tuned offand power consumption will be due only to off-leakage levelstatic power dissipation. As soon as a new event arrives, thecommon-mode feedback circuit will drive both data pair andparity pair voltage lines back to a
Vref common mode voltage,which is set to about 1V in this design. Simultaneously, thedifferential voltages of data pair and parity pair will recoverback to their previous bits value with D = P . In this waythe “LVDS Receivers” with NMOS input transistors will beturned on and will start to convert the LVDS signals to standarddigital ones. Because the first odd token-cell in the decoderwill only take data when P [ i ] = D [ i ] , the receiver will ignore Fig. 4: Proposed signaling scheme with LVDS for datatransmission and common-mode voltage for instant on/offreceiver.potential spurious repeated LSB bits until a new MSB bitarrives. After transmitting the full word, the common-modevoltage of the LVDS pairs will be pull down to Gnd again,turning the receiver off. The recovery speed of common-modevoltage is controlled by a common-mode feedback circuit inthe LVDS driver. In our measurement, the recovery latencyof common-mode voltage is less than 0.5 ns, which is muchshorter than previously reported values (e.g., 6.6 ns in [17]).
D. Transmission Scheme
Figure 5 describes the timing diagram for the transmissionof one event in the proposed bit-serial LVDS link. A four-phasehandshaking protocol is implemented between “Input Buffer”and “TX Token-Ring”. Once the event data D < n − > appears on the “TX Token-Ring” input bus T X . in , the signal T X . r will be set to high by the “Input Buffer”, thus requestinga new data transmission which will trigger the first stage token-cell of “TX Token-Ring” to take the first bit. Meanwhile, thisrequest signal will turn on the LVDS drivers to be ready forsending new data. After a tunable delay t wd , the first odd token-cell of “Token-Ring” will push new bit value D = D < n − > and parity P = D < n − > to shared data and parity wires Data and
Parit y . It will then enable the following stage, i.e.,the first even token-cell to take new bit value. After a cycledelay t d , the enabled stage will disable previous stage and pushnew data/parity D = D < n − > and P = D on the sharedwires, while enabling its following stage. Mutual exclusion isimplemented stage by stage till the end the token-ring. Afterpushing data/parity of the last bit value to shared wires, theig. 5: Event transmission timing diagram of the bit-serialLVDS link. The “Stb” represents the stand-by state.“TX Token-Ring” block will acknowledge the “Input Buffer”by asserting Enc . a to high. Subsequently, T X . r will be resetto low for the successful removal of data D < n − > .Finally, T X . a will be reset to low to complete the four-phasehandshaking cycle.It should be noted that during the wake-up stage the “LVDSReceivers” will need to be turned on for recovering the common-mode voltage of LVDS pairs with data and parity value P = D .The first token-cell of “RX Toke-Ring” will only take dataand parity with P = D . For an event data with even bit width,a safe approach is to fully recover both common-mode anddifferential values of previous bit by repeating the LSB ofthe previous event data with data D = D < > and party P = D < > .The mutually exclusive token-cells of “RX Token-Ring” willtake data from LVDS receivers bit-by-bit. Each bit cycle isdistinguished by either P = D or P = D . The output of eachtoken-cell is latched, once the current token-cell is disabled byits successor. As soon as the last token-cell gets its bit, it willrequest “Output Buffer” to take the whole data packet from alltoken-cells and reset the “RX Token-Ring”. In this design the“RX Token-Ring” is required to have the highest throughput.The tunable delay T d is added in “TX Token-Ring” to enforcethe timing assumptions that “RX Token-Ring” has a higher Fig. 6: Transmitter Token-Ring for encoding data intodata/phase scheme in proposed bit-serial LVDS link.Fig. 7: Circuit implementation of the TX token-cell based on abit-buffer. Each token cell comprises “Handshaking”, “ValidityCheck”, “Bit Buffer”, “Data Buffer” and “Odd/Even ParityBuffer” blocks.throughput than “TX Token-Ring”, to get sequence bit withinone TX bit cycle.III. Circuits Implementation A. Transmitter Token-Ring
The block diagram of “TX Token-Ring” is shown in Fig. 6. Adual-rail asynchronous protocol and four-phase handshaking areused for processing input data. The “TX Token-Ring” comprisesan input “Validity Check” block, “Token-Ring” with odd andeven token-cells, “LVDS Drivers” and a “Control Queue” block.The “Validity Check” block first checks and indicates a validnput event data by
T X . r . The “LVDS Drivers” can then beturned on by T X . r for a valid input event. Meanwhile, the firsttoken-cell starts to take the first bit value T X . f < n − > / T X . t < n − > and push relative data and parity outputsto shared wires. For odd bits, data and parity outputs are D = B and P = D , respectively, while for even bits, theyare D = B and P = D , respectively. So the first odd token-cell will push D = T X . t < n − > to shared data wire and P = T X . f < n − > to shared parity wire. After a tunabledelay t wk when the first token-cell successfully pushes dataand parity value of MSB of input event to shared data andparity wires, the first token-cell will send the enable signalto enable its successor for the next bit value. As a response,its successor will send back the disable signal as soon as itsuccessfully takes a bit value. After a set “bit cycle” time t d when the last token-cell pushes its output to shared data andparity wires, Enc . a will be asserted to high to reset the whole“TX Token-Ring” and acknowledge “Input Buffer” to erase olddata, and will be reset to low to acknowledge that old data hasreturned to zero ( T X . f < n − > = , T X . t < n − > = in . v . The“Handshaking” block generates the acknowledge signal in . a toacknowledge a valid bit input and control signal en to enableBit Buffer block for buffering current input bit value ( en = en = out . t and out . f to data and parityvalue according to LEDR protocol and push them to shared Data and
Parit y wires. Once the current token-ring generatesa valid output which is indicated by out . v =
1, it will enableits successor and disable its predecessor for mutual exclusion.
B. LVDS Drivers
Current mode LVDS Drivers, shown in Fig. 8, are used toconvert data and parity value on shared wires to LVDS pairs.The “LVDS Driver” is implemented such that it can convertinput value D in into LVDS signals for a valid input ( W KUP =
1) and fully turned off for a standby mode (
W KUP = W KUP = DN and D are then both set to logic"1" to tune off their gating PMOS transistors and tune on theirgating NMOS to pull both LV DS _ f and LV DS _ t down to Gnd , with a common-mode LVDS pair voltage V CM =
0. Thiswill switch off its linked “LVDS Receiver” block followingthe proposed instant on/off scheme. Once there is a validinput (
W KUP = V CM = V ref to switch onthe “LVDS Receiver” on the receiver side, and the “Driver”block will start to convert D in into LVDS. The V B and V B Fig. 8: Circuit implementation of the “TX LVDS Driver”.Fig. 9: Receiver Token-Ring for decoding data/phase to eventdata in proposed bit-serial LVDS link.signals are biases to generate proper tail currents for the “CM-FB” and “Driver” blocks. Two resistors with value R = Ω (with another two resistors placed at the input terminals of the“LVDS Receiver”) are used to setup differential amplitude ofLVDS pair. C. Receiver Token-Ring
The architecture of the “RX Token-Ring” for processingand decoding LVDS pairs
LV DS _ D and LV DS _ P is shown inFig. 9. Following the dual-rail asynchronous protocol and four-phase handshaking, the “RX Token-Ring” comprises “LVDSReceivers”, “Token-Ring” with odd and even token-cells and an“Output Buffer” block. The “LVDS Receivers” first digitize theLVDS pairs LV DS _ D / P to digital sequential bits D . f / t and P . f / t , respectively. The token-cells then take bits one-by-onetill the end of this event transmission. Once all token-cellstake and buffer bits value, the following the “Output buffer”ig. 10: Circuit implementation of RX token-cell based on1-bit buffer. Each token-cell comprises Handshaking block,Validity Check block and Bit Buffer block.Fig. 11: LVDS Receiver for digitizing differential LVDS todigital signals.will buffer received event data RX . f < n − > and RX . t < n − > to output bus AE R . out and reset the “RXToken-Ring” for new data.The circuit implementation of the “RX token-cell” followingthe dual-rail protocol and four-phase handshaking is shownin Fig. 10. Each RX token-cell comprises a “Handshaking”,“Validity Check” and “Bit Buffer” block. The odd token-cellwill only take and process bit value with P = D and eventoken-cell will only take and process bit value with P = D .When new bit value comes to the token-ring with proper dataand parity value relationship, for example, D . f = P . t and D . t = P . f for P = D , the current activated odd token-cell willtake this bit value and buffer it with its “Bit Buffer” block.After generating a valid output bit value ( out . v = en will be set to logic “0” to latch the output bit valueand block it to take new bit value. Meanwhile, this token-cellwill enable its following token-cell for a new token. Fig. 12: Die photo of test chip with proposed event-driven bit-serial LVDS link in AMS 0.18 um 1P6M process, in which TXblock occupies an area of 0.08 mm and RX block occupiesan area of 0.06 mm . D. LVDS Receivers
In order to meet the requirement of instant on/off by meansof the LVDS common-mode voltage, we implemented theamplifier-based LVDS receivers with NMOS inputs. The circuitimplementation of the proposed “LVDS Receiver” is shown inFig. 11. It comprises an “Amp” block, a “Latch” block and a“Buffer” block. The “Amp” block is responsible for digitizingthe LVDS signals. In standby mode, the “Amp” stage willbe fully tuned off with
LV DS _ f / t =
0. Once there is datafrom transmitter side that needs to be transmitted (i.e., when V CM = V ref ), the “Amp” stage will be tuned on instantly. Alatch stage with dynamic biases is implemented to latch thelast bit value of previous event data once the “Amp” stage isswitched to sleep mode so that the LVDS receiver will notwake up with a random output bit value. After a successfulevent transmission, V CM of the LVDS pair will be switchedfrom Vre f to Gnd , with
V P and
V N shifting to
Vdd and
Gnd respectively. This will strengthen the drive ability of the latchstage to store the current bit value when the “Amp” stage isturned off. As new event data arrives, the signals
V P and
V N will be shifted to near
Vdd/2 to tune the latch stage weaker, sothat it can be modified by the new data. An active-low resetsignal
RstB is used to reset the circuit outputs to a properinitial condition ( P = D ) when powering up the chip.IV. Experimental resultsThe proposed fully asynchronous event-driven bit-serialLVDS link was implemented using a standard 0.18 µ m 1P6MCMOS process, occupying a silicon area of 0.14 mm . Figure 12shows the die photo of the fabricated test chip. The whole“Transmitter” block including the “TX_Buffer” occupies anarea of 0.08 mm , and the “Receiver” block including the“RX_Buffer” occupies an area of 0.06 mm . Additionally, asmall spiking neural array with tunable output event rate isimplemented to provide events for testing. A 32-bit router isimplemented for routing events from Receiver to Transmitterig. 13: The setup for testing LVDS links between two chipsfor bidirectional communication.Fig. 14: Transient signals of LVDS pairs captured on the re-ceiver’s inputs: the traces D . f and D . t represent the differentialsignals for LV DS _ D ; the traces P . f and P . t represent thedifferential signals for LV DS _ P ; The D _ Di f f and P _ Di f f traces are differential voltages of two LVDS pairs; The D _ CM and P _ CM traces represent the common voltages of the twoLVDS pairs; The last plot shows the RX _ Ack signal, which isthe acknowledge signal from the target chip to the source chip,representing a successful event transmission.to realize a transmission loop between 2 chips to explore peaktransmission throughput.Figure 13 shows a setup with two chips placed side-by-side for the experiments. With this setup, we transmittedsequences of 32-bit AER events bi-directionally between twochips, through four LVDS pairs: The signals
LV DS D and Fig. 15: Transient signals of LVDS pairs at receiver inputs: (a)differential mode of LVDS signals, (b) acknowledge signal fromthe receiver, (c) details of single event transmission signals. LV DS P were used to transmit events from Chip1 to Chip2,and LV DS D and LV DS P were used to transmit eventsfrom Chip2 to Chip1.Transient Signals of LVDS pairs were observed and capturedusing a Tektronix DPO7000 Oscilloscope, from the inputterminals of the “LVDS Receivers”. As shown in Fig. 14, the LV DS _ D plot shows data from the LVDS pair with differentialsignals D . f and D . t . The LV DS _ P plot shows the parity LVDSpair with differential signals P . f and P . t . The D _ Di f f and P _ Di f f traces in the V Di f f plot are the differential voltages ofthe data LVDS and parity LVDS pairs, respectively. The D _ CM and P _ CM traces in the V CM plot are the common voltage ofdata LVDS and parity LVDS pairs, respectively. The out . a plotshows the acknowledge signal from the target receiver chipfor acknowledging a successfully event transmission. Sequencebits are presented bit-by-bit following the LEDR protocol, viathe data and parity differential signals D _ Di f f and P _ Di f f .The common-voltages of the two pairs D _ CM and P _ CM arereset to Gnd at the end of a successful event transmission andare quickly recovered with new coming events. During therecovery of common-mode voltages of the LVDS pairs, theLSB of previous event with P = D is repeated for sufficientlong time to guarantee that the receiver is fully and successfullyswitched on.Figure 15 shows the transient signals of the LVDS pairs atthe input terminals of receiver chip, captured by an LVDS probeTektronixP6880. The bit cycle is set to be around 0.67 ns bytuning the delay cell t d in the “TX Token-Ring” to achieve a bit-rate of 1.5 Gps. The observed switch on/off speed of the receiverABLE I: Performance comparison of LVDS transceiver VLSI implementations. [16] [18] [14] This workTechnology µ m 90 nm 0.35 µ m 0.18 µ m Power Supply
Area mm mm mm mm Clocked CDR
No Yes Yes No
Bit Rate
Event Rate - 29.4 M Event/s 13.7 M Event/s 35.7 M Event/s P max
77 mA 40.1 mA 15.9 mA 22.9 mA P min - 40.1 mA 0.4 mA 0.122 µ A P max / P min - 1 39 187.7k Fig. 16: Power consumption of asynchronous serial-bit LVDSlink.is approximately 0.45 ns and 0.5 ns, respectively, leading toa smaller latency for Address-Event (AE) transmission. Asmeasured in Fig. 15(b), the latency needed for a successfultransmission between chips (from switching on the Receiverto getting acknowledge signal out . a from receiver chip) fora 1.5 Gps bit-rate is 31 ns. The period of a successive eventstransmission is 28 ns. Since the transmitter has locally pre-stored the “out.a” signal in the “Control Queue” block (seeSection II-B), it will keep on sending event data withoutwaiting for acknowledge signals from the receiver chip untilthe “Control Queue” is fully empty, to further decrease latency.This is evidenced in Fig. 15(a) and (b), as the second eventtransmission happens before the arrival of first acknowledgesignal out . a . In Fig. 15(c) we can observe that, for transmittinga 32-bit event data with a bit-rate of 1.5 Gps, the LVDS linkwill only be switched on for 25.6 ns, and will be switched offinstantly on both transmitter and receiver sides, leading to apure event-rate related power consumption.In Fig. 16 we plot the measured power consumption fordifferent event transmission rates. The peak event rate that canbe achieved in our experimental setup is 35.7 M Events/second(32-bit) with current consumption of 19.3 mA and 3.57 mAfor transmitter and receiver part, respectively. The powerconsumption of both transmitter and receiver part scales linearlywith the event transmission rate. At a 10k event rate, the power consumption of the transmitter and receiver blocks are 5.2 µ Aand 1.05 µ A, respectively. The power consumption can furthergo down to sub- µ A for a lower event rates (<1k Events/second),with a floor of 80 nA for transmitter and 42 nA for receiverwhich is mainly dominated by leakage current of circuits.Table I shows a performance comparison between differentdesigns. However, area and power consumption of CDR circuitsemployed in the designs of [14], [18] are not reported. Soit may be that significant additional silicon area and powerconsumption are required for those designs.V. ConclusionsWhile neuromorphic electronic systems have the potential ofsolving the memory bottleneck problem [19], by constructionthey also face an important I/O bottleneck problem: large scaleneuromorphic system implementations are typically composedof multiple cores and/or multiple chips tiled together, with grid-like communication networks. To transmit address-events acrossthese cores and chips and to sustain the required bandwidth,current implementations use multiple parallel AER buses (e.g.,for North-South, East-West, and possibly diagonal links). Inthis paper we argued that full parallel or even word-serial AERprotocols are not scalable, as they require large number ofpins/pads and large power consumption to quickly charge anddischarge all these lines. To solve this problem, we proposed anultra low-power fully asynchronous event-driven instant on/offbit-serial LVDS link, which is suitable for AER transmissionin neuromorphic multi-chip systems. The proposed LVDSlink uses LEDR encoding and a token-ring architecture toeliminate the need for clock-based CDR blocks with expensiveon chip DLL/PLL circuits, leading to a very compact andlow-power circuit implementation. A novel scheme is proposedto implement a low-latency event-driven transmission withsub-ns instant on/off feature. Experimental results demonstratehow the proposed bit-serial LVDS link can achieve an eventrate of 35.7 M Events/second with a bit-rate of 1.5 Gps. Thepower consumption of the proposed LVDS link is pure rate-dependent, with a sub- µ A power consumption for low eventrates (e.g., ≈
1k Events/second).AcknowledgmentThis work is supported by the EU ERC grant “NeuroP”(257219) and by the EU ICT grant “NeuRAM ” (687299).eferences [1] S. Moradi, N. Qiao, F. Stefanini, and G. Indiveri, “A scalable multicorearchitecture with heterogeneous memory structures for dynamicneuromorphic asynchronous processors (DYNAPs),” Biomedical Circuitsand Systems, IEEE Transactions on , pp. 1–17, 2017.[2] J. Park, T. Yu, S. Joshi, C. Maier, and G. Cauwenberghs, “Hierarchicaladdress event routing for reconfigurable large-scale neuromorphic sys-tems,”
IEEE Transactions on Neural Networks and Learning Systems ,pp. 1–15, 2016.[3] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada,F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo,I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner,W. P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuronintegrated circuit with a scalable communication network and interface,”
Science , vol. 345, no. 6197, pp. 668–673, Aug 2014.[4] S. Furber, F. Galluppi, S. Temple, and L. Plana, “The SpiNNaker project,”
Proceedings of the IEEE , vol. 102, no. 5, pp. 652–665, May 2014.[5] B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chan-drasekaran, J. Bussat, R. Alvarez-Icaza, J. Arthur, P. Merolla, andK. Boahen, “Neurogrid: A mixed-analog-digital multichip system forlarge-scale neural simulations,”
Proceedings of the IEEE , vol. 102, no. 5,pp. 699–716, 2014.[6] S.-C. Liu, T. Delbruck, G. Indiveri, A. Whatley, and R. Douglas,
Event-based neuromorphic systems . Wiley, 2014.[7] S.-C. Liu and T. Delbruck, “Neuromorphic sensory systems,”
CurrentOpinion in Neurobiology , vol. 20, no. 3, pp. 288–295, 2010.[8] N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini,D. Sumislawska, and G. Indiveri, “A re-configurable on-line learningspiking neuromorphic processor comprising 256 neurons and 128ksynapses,”
Frontiers in Neuroscience , vol. 9, no. 141, 2015.[9] M. Giulioni, P. Camilleri, M. Mattia, V. Dante, J. Braun, and P. D.Giudice, “Robust working memory in an asynchronously spiking neuralnetwork realized in neuromorphic VLSI,”
Frontiers in Neuroscience ,vol. 5, no. 149, 2012.[10] E. Neftci, J. Binas, U. Rutishauser, E. Chicca, G. Indiveri, andR. Douglas, “Synthesizing cognition in neuromorphic electronic systems,”
Proceedings of the National Academy of Sciences , vol. 110, no. 37, pp.E3468–E3476, 2013.[11] R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco, R. Paz-Vicente, F. Gómez-Rodriguez, L. Camunas-Mesa,R. Berner, M. Rivas-Perez, T. Delbruck, S.-C. Liu, R. Douglas,P. Häfliger, G. Jimenez-Moreno, A. Civit-Ballcels, T. Serrano-Gotarredona, A. Acosta-Jiménez, and B. Linares-Barranco, “CAVIAR:A 45k neuron, 5M synapse, 12G connects/s aer hardware sensory–processing– learning–actuating system for high-speed visual objectrecognition and tracking,”
IEEE Transactions on Neural Networks , vol. 20,no. 9, pp. 1417–1438, September 2009.[12] E. Chicca, A. Whatley, P. Lichtsteiner, V. Dante, T. Delbruck, P. DelGiudice, R. Douglas, and G. Indiveri, “A multi-chip pulse-basedneuromorphic infrastructure and its application to a model of orientationselectivity,”
IEEE Transactions on Circuits and Systems I , vol. 5, no. 54,pp. 981–993, 2007.[13] C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck, “A 240 × µ s latency global shutter spatiotemporal vision sensor,” IEEEJournal of Solid-State Circuits , vol. 49, no. 10, pp. 2333–2341, 2014.[14] C. Zamarreño-Ramos, R. Kulkarni, J. Silva-Martínez, T. Serrano-Gotarredona, and B. Linares-Barranco, “A 1.5 ns OFF/ON switching-timevoltage-mode LVDS driver/receiver pair for asynchronous AER bit-serialchip grid links with up to 40 times event-rate dependent power savings,”
Biomedical Circuits and Systems, IEEE Transactions on , vol. 7, no. 5,pp. 722–731, 2013.[15] M. E. Dean, T. E. Williams, and D. L. Dill, “Efficient self-timing withlevel-encoded 2-phase dual-rail (LEDR),” in
Proceedings of the 1991University of California/Santa Cruz conference on Advanced research inVLSI . MIT Press, 1991, pp. 55–70.[16] J. Teifel and R. Manohar, “A high-speed clockless serial link transceiver,”in
Asynchronous Circuits and Systems, 2003. Proceedings. Ninth Inter-national Symposium on . IEEE, 2003, pp. 151–161.[17] C. Zamarreno-Ramos, T. Serrano-Gotarredona, and B. Linares-Barranco,“A 0 . µ m sub-ns wake-up time ON-OFF switchable LVDS driver-receiver chip I/O pad pair for rate-dependent power saving in AERbit-serial links,” Biomedical Circuits and Systems, IEEE Transactionson , vol. 6, no. 5, pp. 486–497, 2012. [18] C. Zamarreno-Ramos, R. Serrano-Gotarredona, T. Serrano-Gotarredona,and B. Linares-Barranco, “LVDS interface for aer links with burst modeoperation capability,” in
Circuits and Systems, 2008. ISCAS 2008. IEEEInternational Symposium on . IEEE, 2008, pp. 644–647.[19] G. Indiveri, and S.-C. Liu, “Memory and information processing inneuromorphic systems,” in