[PDF] Optimum Reconfiguration of Routing Interconnection Network in APSoC Fabrics

Abstract

This paper presents an automated algorithm for optimum configuration of routing interconnection network in Xilinx Zynq-7000 All programmable system-on-chip (APSoC) fabrics. A method to configure circuits with optimum routing resources is presented along with their performance parameters with and without the proposed algorithm. The proposed algorithm enables full control over routing resources for using different interconnection types in order to create routing-based circuit-under-test. The algorithm proposes the routing techniques through the 2-D array of switch matrices inside the interconnection network and automatically identifies the involved programmable interconnection points associated with a node. An experimental setup is proposed to measure the performance parameters such as slack time and power with and without the applied algorithm on the APSoC routing resources. The proposed setup requires no external equipment such as manufactured equipments or external instruments for performance measurement.

Full PDF

11 Optimum Reconﬁguration of RoutingInterconnection Network in APSoC Fabrics

Mostafa Darvishi,

Member IEEE

Abstract —This paper presents an automated algorithm for op-timum conﬁguration of routing interconnection network in XilinxZynq-7000 All programmable system-on-chip (APSoC) fabrics.A method to conﬁgure circuits with optimum routing resourcesis presented along with their performance parameters with andwithout the proposed algorithm. The proposed algorithm enablesfull control over routing resources for using different intercon-nection types in order to create routing-based circuit-under-test. The algorithm proposes the routing techniques throughthe 2-D array of switch matrices inside the interconnectionnetwork and automatically identiﬁes the involved programmableinterconnection points associated with a node. An experimentalsetup is proposed to measure the performance parameters suchas slack time and power with and without the applied algorithmon the APSoC routing resources. The proposed setup requires noexternal equipment such as manufactured equipments or externalinstruments for performance measurement.

Index Terms —All Programmable System-on-Chip, XilinxZynq-7000, Routing Interconnection Network.

I. I

NTRODUCTION F IELD Programmable Gate Arrays (FPGAs) have attracteda lot of interest in various domains due to their highcircuit density and growing performance capability. Thesesemiconductor devices are structured in an array of conﬁg-urable logic blocks (CLB) connected via a programmablerouting interconnection network [1], [2]. Advances in semicon-ductor technology enabled integrating programmable logicswith complex systems in a single silicon die. These newXilinx All Programmable System-on-Chip (APSoC) deviceshave been used extensively in different applications in recentyears [1]–[5]. The PL part of the APSoC is the FPGA itselfand is used for implementation of different digital circuits andsystems. The PS part contains a microcontroller. Hence, thefunctionality of any system implemented into an APSoC canbe partitioned between PL and PS while the PS can alsotake the control over the PL. Among the APSoC devices,the Xilinx Zynq-7000 fabricated in the Taiwan SemiconductorManufacturing Company’s 28 nm technology node has beenvastly used for different applications in recent years [6]–[8].Routing resources in Xilinx APSoC fabrics are controlled bySRAM cells that are called conﬁguration bits [9].This paper presents an automated algorithm for optimumconﬁguring of the routing resources in the programmable logic(PL) of a Zynq-7000 APSoC device. We propose a method toconﬁgure circuits with optimum routing resources as well asthe performance validation results with and without the pro-posed algorithm. The proposed algorithm enables full control

M. Darvishi obtained the Ph.D. in Digital Microelectronics from ElectricalEngineering Department of Polytechnique Montreal, QC, Canada, H3C 3A7.(e-mail: [email protected]). over routing resources for using different interconnection typesto create optimum routing paths for the implemented circuits.The automated algorithm is implemented in the Xilinx Vivadoscripting tool. Also, the algorithm proposes a technique fortraversing switch matrices (SM) inside the interconnection net-work and automatically identiﬁes the involved programmableinterconnection points (PIP) associated with an input or outputpin of an SM. It is noted that the default routing optimizer ofthe Xilinx Vivado tool never considers the optimum routingpaths for a targeted design for implementation if the timingconstraints are met. Even in the case of timing violation,Vivado optimizer tool just informs the violatied paths whileno solution than increasing the paths delays is proposed bydefault. We also propose an experimental setup to measure theperformance parameters such as slack time and power with andwithout the applied algorithm. No external equipment (e.g.,such as manufactured equipments or external instruments) isrequired for such measurements.This paper is structured as follows. Some backgroundinformation is presented in Section II. An overview of therouting resources in Zynq-7000 APSoC is presented in SectionIII. The proposed algorithm for routing resources in Zynq-7000 APSoC is presented in Section IV. Experimental setupand results are discussed in Section V. Conclusion and futureworks are ﬁnally drawn in Section VI.II. B

ACKGROUND

This paper focuses on developing and automated algorithmfor optimum conﬁguration of interconnection network in thePL resource in a Zynq-7000 device. This state-of-the-artAPSoC device offers specialized modules merged with thePS in a single die. It is noted that almost 98% of all memoryelements in the PL part are conﬁguration bits, of which morethan 90% control the routing resources [1], [5].Routing interconnection network in the PL section is con-ﬁgured through a 2-D array of SMs. An input pin of eachSM comprises a set of PIPs where a PIP, a CMOS transistorswitch, can be programmably turned on/off to add/removeinterconnects throughout the network.Recent studies conﬁrms the vast involvement of routinginterconnection network in a variety of applications such asclock tuning [10], Time-to-Digital Converters (TDC) [11],[12], Physical Unclonable Functions (PUF) [13], and TrueRandom Number Generators (TRNG) [14] have shown thatfull control of the routing path between two given points ofthe circuit is an essential requirement. It is noted that theplacement of circuit elements can be fully controlled by thedesigner, while routing resources are less controllable. a r X i v : . [ c s . A R ] A ug Fig. 1: Topology of CLB and INT tiles in Xilinx APSoCs. In this example, Logical net NetA connects a source ﬂip-ﬂop to adestination LUT made of seven nodes each made of SINGLE (1L) interconnects.To make the proposed routing algorithm more compre-hendible and also the ease of performance validation, therouting-based ring oscillators (RO) will be implemented asthe circuits under test (CUT).III. OVERVIEW OF ROUTING RESOURCES INZYNQ-7000 APSOCGeneric APSoC fabrics consist of some fundamental logiccores linked via an interconnection network. Three types ofresources are commonly used in APSoC architectures: logicresources, routing interconnects, and switch matrices [15].In this paper, we focus on the routing resources and switchmatrices that are the roots of the interconnection network.

A. Logic and Interconnect Tiles Resources

Logic resources in APSoCs are linked via an intercon-nection network comprised of different interconnection types.Some interconnects are dedicated to speciﬁc logics or func-tions and the rests are global. Interconnects in the networkspan in both horizontal and vertical planes traversing the gatearray from west to east and north to south, respectively. SMsare used to link various interconnects and transmit data insidethe fabric.Fig. 1 shows the topology of a CLB and interconnect (INT)tiles in Xilinx 7-Series APSoCs [3], [5]. In this scheme, theplanar SM has an injective mapping where each input nodeon the right side is connected to only one node on its leftside. The INT tile comprises one Wilton SM (WSM) whereeach input node has a multiple mapping possibility to severaloutput nodes and vice versa [16]. A net, such as NetA in Fig. 1, comprises a list of nodes and represents a logic net. AWSM input node sends data signal to several outgoing nodes(called downhill node) and one of the PIPs connected to anoutput node is conﬁgured to receive data signal from oneof its multiple incoming nodes (called uphill nodes). A PIPspeciﬁes a conﬁgurable connection between an SM input andan SM output comprised of a programmable CMOS transistoras depicted in Fig. 1 [17].

B. Interconnect Types Available in 7-Series FPGAs

Xilinx APSoCs generally consist of ﬁfteen types of inter-connects for data signal transmission throughout the PL side ofthe fabric. It is noted that these interconnects are not valid forthe PS side cause it is made of the multi-core microcontrollercores with dedicated interconnection network for itself and isout of the focus in this paper. Interconnects linking the WiltonSMs are categorized as follows [3], [5]: • SINGLE (1L) : unidirectional interconnects that span 1CLB; • DOUBLE (2L) : unidirectional interconnects that span 1or 2 CLBs; • HQUAD (4L) : unidirectional interconnects that span 4CLBs; • VQUAD : unidirectional interconnects that span 6 CLBs; • BOUNCEACROSS : unidirectional interconnects thatspan 1 CLB only vertically; • VLONG : bidirectional long interconnects that span 20CLBs vertically; • VLONG12 : bidirectional long interconnects that span 12CLBs vertically;

TABLE I: TYPES AND NUMBER OF INTERCONNECTSLINKED TO EACH WILTON SM IN XILINX APSOCS

Interconnect Type Number of InterconnectsConnected to each Wilton SM

DOUBLE 70SINGLE 68BOUNCEACROSS 17VLONG 3HLONG 3PINFEED 42OUTBOUND 24*BOUNCEIN 9PINBOUNCE 16GLOBAL 12HQUAD 17BENTQUAD 34VQUAD 18VLONG12 2HVCCGNDOUT 2 • HLONG : bidirectional long interconnects that span 20CLBs horizontally; • GLOBAL : homogeneous and unidirectional intercon-nects that span 20 CLBs vertically and are dedicated toroute speciﬁc signals (clock, reset, enable, etc.); • BENTQUAD : unidirectional interconnects that bend andspan 6 CLBs; • PINFEED : short interconnects that link Wilton SM toplanar SM (coming into planar SM); • OUTBOUND : short interconnects that link planar SM toWilton SM (outgoing from planar SM); some of themalso span 1 CLB; • BOUNCEIN : short internal Wilton SM interconnects atsome input nodes used to bounce signal; • PINBOUNCE : short internal Wilton SM interconnects atsome output nodes used to bounce signal; • HVCCGNDOUT : GND and VCC interconnects to linkWilton SM nodes to logic ‘0’ or logic ‘1’, respectively.TableI provides the number of each interconnect’s typeconnected to a Wilton SM [3,5]. Four out of a total of 24OUTBOUND interconnects span only one CLB, while theother rests link planar SMs to the Wilton SMs (* in Table I).Interconnects of a same type may have different topologies.A logical net, for example NetA shown in Fig. 1, comprisesa list of nodes {Source CLBLM_M_A LOGIC_OUTS2SW1BEG1 SW1BEG1 NN1BEG1 NN1BEG1 EE1BEG1EE1BEG1 IMUX7 IMUX7 SW1BEG1 SW1BEG1NW1BEG1 SW1BEG1 LOGIC_OUTS2 CLBLM_M_D6Destitation} connecting a ﬂip-ﬂop source to a LUT destinationbetween two cross CLBs. NetA has seventeen nodes made ofSINGLE (1L) interconnect that spans only 1SM.

C. PIP Notation and Interconnect Coordinates

In Zynq-7000 fabrics, usually a PIP is called by the nameof interconnect it is connected to and the interconnect coordi-nates. The index of BEG or END is assigned to the PIP’s name Fig. 2: Proposed algorithm for WSMs.Fig. 3: Proposed algorithm for WSMs.depending on the interconnect’s tail being the beginning or theend of interconnect). For instance, a PIP in the tiles havingcoordinates X=5 and Y=15, connecting the beginning (BEG)of a SINGLE (1L) interconnect coming from southeast (NW)tile and the beginning (BEG) of a DOUBLE(2L) interconnectgoing to northwest (SE), is identiﬁed as: pip INT_R_X5Y15 NW1BEG0 -> SE2BEG1

The numbers before BEG introduces the interconnects’ lengththey are connecting (2 for DOUBLE interconnect and 1 forSINGLE interconnect in this example). The last number isan auto-assigned index by Vivado tool to distinguish theinterconnects of a same category.IV. PROPOSED ROUTING ALGORITHM FORSWITCH MATRICESIt is possible to determine all the PIPs associated with eachnode of

NetA in Fig. 1 that are connected to an input or outputpin of a WSM. A trivial method is to select an input/outputpin of a WSM and manually check the PIP Junction Propertiesin Vivado. This property identiﬁes the PIPs of only onepin of a WSM at a time. It means that, it is not possibleto identify the PIPs associated with several pins of a WSM,or the PIPs in different WSM levels simultaneously. This is

TABLE II: EXAMPLE OF EXTRACTED PIPS CON-NECTED TO LOGIC_OUTS2 AND NN1BEG3 INTERCON-NECTS IN WSM1 AND WSM4 RESULTED BY THE PRO-POSED ALGORITHM

PIPs in wsm_level=1: PIPs in wsm_level=2:node connected to node connected toLOGIC_OUTS2 NN1BEG3

WW4BEG0 , NW2BEG0 WW4BEG0 , LV_L0WW2BEG0 , NR1BEG0 WR1BEG1 , WW2BEG0WR1BEG1 , NN6BEG0 NL1BEG_N3 , WL1BEG2WN1BEG_N3 , NN1BEG3 NW6BEG0 , SW6BEG3SW6BE0 , NE6BEG0 NW2BEG0 , SW2BEG3SW2BEG0 , NE2BEG0 NN6BEG0 , SS6BG3SS6BEG0 , IMUX_L8 NN2BEG0 , SS2BEG3SS2BEG0 , IMUX_L40 NE6BEG0 , SR1BEG1SR1BEG1 , IMUX_L32 LV_L18 , ER1BEG_S0SL1BEG0 , IMUX_L24SE6BEG0 , IMUX_L16SE2BEG0 , IMUX_L0NL1BEG_N3 , ER1BEG1BYP_ALT0 , EL1BEG_N3FAN_ALT0 , EE4BEG0NW6BEG0 , EE2BEG0very time consuming and inaccurate due to possible mistakesin net selection for designs including several nets. Automaticdetermination of all PIPs associated with each node facilitatesthe analysis of routing interconnection network in APSoCdevices.An algorithm is proposed for WSMs that automaticallyextracts all the PIPs in all WSM levels associated with eachpin for any logic net as described in Fig. 2. In this pseudo-code, the source and destination logic cells are generated andplaced with the FPGA ﬂoorplanner in STEP 1. The notion oflogic cell here refers to a slice logic element, such as ﬂip-ﬂop or LUT. Then, a net is routed between the source anddestination (STEP 2).The proposed pseudo-code relies on the Xilinx DesignConstraint (XDC) ﬁle, which provides full control of theplacement and routing of a net in Vivado. Unlike using theautomated place and route tasks performed by the tool, a groupof nodes that the net should pass through can be speciﬁed. Thiscan be achieved by employing the Tool Command Language(TCL) scripting available in Vivado. The FIXED_ROUTEproperty allows the generation of a list of nodes to conﬁgurea net. This property should end with a speciﬁed name forthe net that is going to be routed. For example, the NetA inFig. 1 is conﬁgured with the TCL script shown in Fig. 3. Theindices associated with each node are pre-deﬁned in Vivado todistinguish interconnects in the same category and cannot bechanged. TableII shows a partial result for the extracted PIPsconnected to LOGIC_OUTS2 and NN1BEG3 interconnects inWSM1 and WSM4 resulted by the proposed algorithm.Fig. 4 shows a routing diagram representation of threedifferent sets of ROs preliminary implemented on the XilinxAPSoC [3], [5] “without” using the proposed routing optimiza- Fig. 4: Routing diagram for three different sets of ROspreliminary implemented on the Xilinx APSoC “without” theproposed algorithm: (a) diagram of 1L, 2L, 4L and LONGinterconnects, (b) diagram of BENTQUAD interconnects, and(c) diagram of BOUNCEACROSS and VQUAD interconnects[3], [5].tion algorithm. Fig. 4(a) to Fig. 4(c) show routing diagrams ofhorizontal ROs, the BENTQUAD RO, and the vertical ROs,respectively. In this ﬁgure, each Wilton SM is shown with acircle (SM) and each interconnect is shown with an arrow (I).The proposed RO architecture makes use of long routing pathsand only two logic components.V. EXPERIMENTS AND RESULTSIn the experiments, different ROs were implemented on thePL side of Zynq-7000 APSoC and their routing net delay aswell as their frequencies were measured during run time.

A. Setup and preliminary implementation

The routing net delay and frequency of each individualRO is measured by using the using Xilinx Integrated Logicanalyzer (ILA). Fig. 5 shows the schematic of the implementedCUTs on the Zynq-7000 ZC702 APSoC using different typesof interconnections as reported in Table I. It is noted that not

Fig. 5: Experimental setup.TABLE III: PERFORMANCE RESULTS OF THE IMPLE-MENTATION OF ROS USING THE PROPOSED ALGO-RITHM

RO Type Frequncy Net Delay

1L 48912 398 511L 48909 402 522L 22541 696 562L 22541 696 564L 6399 183 604L 6398 182 60LONG 16119 521 27LONG 16121 516 26BENTQUAD 23551 611 22BENTQUAD 23548 615 23BOUNCEACROSS 29852 489 25BOUNCEACROSS 29851 490 25VQUAD 29790 516 21VQUAD 29789 519 22 all the interconnects reported in Table I are usable for ROimplementation cause some of them do not span more than asingle WSM (e.g. PINFEED interconnect). Two ROs per typewere implemented that resulted in total of 14 ROs. Table IIIshows the performance results of each RO “without” applyingthe proposed algorithm.

B. Implementation with the Proposed Algorithm

In the next step, the Ros were implementation while thealgorithm written in TCL script was also sourced duringthe design implementation process. Fig. 6 shows updatedrouting diagram of the implemented ROs using the proposedalgorithm. It is noted the optimized logic cell coordinates androuting topology has also been updated. Table IV shows thenew measurements for the updated design using the proposedalgorithm as shown in Fig. 6. It shows an improvement inthe parameters specially the optimized number of utilizedinterconnects and the net delay. Fig. 6: Routing diagram for three different sets of ROs imple-mented on the Xilinx APSoC “with” the proposed algorithm:(a) diagram of 1L, 2L, 4L and LONG interconnects, (b)diagram of BENTQUAD interconnects, and (c) diagram ofBOUNCEACROSS and VQUAD interconnects.VI. C

ONCLUSION

This paper has presented a detailed analysis of an opti-mization algorithm applied to the routing resources in PL partof a Zynq-7000 APSoC that includes an SRAM-based FPGA(7Z020-CLG484) available on a ZC702. Fourteen ROs havebeen conﬁgured using two logic cells and routing resources ofdifferent interconnection types. The frequency and net delay ofROs has been measured using the ILA and delay measurementscripts in Vivado tool. The measurements have been performedfor two scenarios: implementation of ROs “without” applyingthe proposed algorithm, and implementation “with” deployingthe algorithm. In the former, the Vivado self-optimizer tookcare of the placement and routing while in the latter theproposed algorithm has overwritten the new placement androuting topology. R