AdEle: An Adaptive Congestion-and-Energy-Aware Elevator Selection for Partially Connected 3D NoCs
AAdEle: An Adaptive Congestion-and-Energy-AwareElevator Selection for Partially Connected 3D NoCs
Ebadollah Taheri, Ryan G. Kim, and Mahdi Nikdast
Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523, USA
Abstract —By lowering the number of vertical connections infully connected 3D networks-on-chip (NoCs), partially connected3D NoCs (PC-3DNoCs) help alleviate reliability and fabricationissues. This paper proposes a novel, adaptive congestion- andenergy-aware elevator-selection scheme called AdEle to improvethe traffic distribution in PC-3DNoCs. AdEle employs an offlinemulti-objective simulated-annealing-based algorithm to find goodelevator subsets and an online elevator selection policy to enhanceelevator selection during routing. Compared to the state-of-the-art techniques under different real-application traffics andconfiguration scenarios, AdEle improves the network latency by14.9% on average (up to 21.4%) with less than 10.5% energyconsumption overhead.
Index Terms —Partially connected 3D networks-on-chip,through-silicon via, simulated annealing, elevator selection.
I. I
NTRODUCTION
Network-on-chip (NoC) has become the prevailing solutionto enable scalable on-chip communication in manycore sys-tems. Moreover, with the advances in three-dimensional (3D)integration technologies, 3D NoCs are emerging to furtherimprove the heterogeneity and integration density by verticallystacking multiple dies connected with an efficient die-to-dieinterconnect [1]. Among different vertical interconnect tech-nologies, through-silicon vias (TSVs) promise high bandwidthand low power [2]–[4].TSV-based 3D NoCs have been proposed for various appli-cations (e.g., [5], [6]). However, vertical links in TSV-based3D NoCs use multiple TSVs in a bundle, resulting in high areaoverhead due to the large TSV interconnect-pitch and keep-out-zone requirements [7]. Also, TSVs are particularly sus-ceptible to electromigration and capacitive crosstalk-inducedissues [8], [9]. Therefore, 3D NoC architectures with TSVs atevery router (i.e., fully connected) impose higher design com-plexity, fabrication costs, and performance degradation [1], [4].Addressing such challenges has motivated the development of3D NoCs with fewer TSV-based vertical links, also known aspartially connected 3D NoCs (PC-3DNoCs) [1], [10], [11].Nevertheless, PC-3DNoCs introduce some new design chal-lenges because of their partial vertical connectivity [11],[12]. In particular, the vertical links (a.k.a. the elevators)must be shared among multiple routers, potentially creatingtraffic hotspots at the elevators and increasing the networklatency [1]. To balance the traffic at these hotspot elevators, anadaptive routing technique is needed to select lower utilizedelevators without detouring far from the minimal path (i.e.,elevator-selection problem). Yet, initial routing solutions inPC-3DNoCs (e.g., Elevator-First routing [10]) na¨ıvely selectthe nearest elevator without considering the traffic, resulting in unbalanced elevator utilization. To reduce congestion, ad-vanced methods (e.g., CDA [12]) use global traffic informationto improve the traffic distribution during runtime. However, re-trieving global traffic information increases both the hardwareoverhead and network traffic.This paper addresses the elevator-selection problem inPC-3DNoC routing techniques by developing, for the firsttime, a novel, congestion- and energy-aware adaptive elevator-selection scheme called AdEle. AdEle works in two stages tobalance the traffic with minimal overhead: an offline elevator-set optimization and an online elevator-selection policy. Inthe offline elevator-set optimization, AdEle uses a multi-objective simulated-annealing-based optimization algorithm(AMOSA [13]) to collectively choose an optimized subset ofelevators for each source router that minimizes the averagelatency and energy under an assumed traffic scenario. Duringruntime, each router monitors its local traffic and selects oneelevator from its subset to improve the latency of the net-work. AdEle employs a low-overhead local traffic monitoringtechnique that examines the blocking as a proxy for pathcongestion, balancing the elevator traffic while eliminating theoverhead of global traffic monitoring used in other approaches.Our results simulated using different real-application trafficsand configuration scenarios show the promise of AdEle com-pared to the state-of-the-art techniques: on average, AdEleimproves the network latency by 14.9% (up to 21.4%) andwith only 10.3% (up to 10.5%) energy consumption overhead.The rest of the paper is organized as follows. We review therecent related work on PC-3DNoCs in Section II. Section IIIdiscusses the elevator-selection problem and its complexity inPC-3DNoCs and details our proposed technique and its im-plementation. We present our simulation results in Section IV.Finally, Section V concludes the paper.II. B
ACKGROUND AND R ELATED W ORK
Employing conventional dimension-order routing algo-rithms in PC-3DNoCs will result in deadlock because of theirregular topology in such networks. To prevent deadlock, theElevator-First routing algorithm [10] employs two virtual net-works to break cyclic dependencies. Moreover, as the elevator-less routers cannot directly send packets to other layers, anelevator is selected for each packet to facilitate the inter-layercommunication. Leveraging such a principle, several routingalgorithms have been proposed for PC-3DNoCs [3], [14].However, they follow an elevator-selection policy that ignoreselevators’ load distribution and the minimal path. This can beespecially harmful for PC-3DNoCs with non-uniform elevator a r X i v : . [ c s . D C ] F e b inimal path Non-minimal path D S e e e (a) (b)Fig. 1. (a) An example PC-3DNoC with three elevators ( e , e , and e ).The routing path from S to D based on Elevator-First algorithm [10] (dotted-red line) and the minimal path (blue-solid line) are shown. The middle-layerrouters are colored based on their Elevator-First selected elevator. (b) Trafficload on each router in the middle layer: the e elevator is highly congestedbecause of the inefficient elevator selection in Elevator-First algorithm. placements, small number of elevators, or non-uniform trafficdistributions. Adaptive elevator-selection techniques have beenproposed [4], [11], [15] but mainly focus on elevator failureconcerns. These strategies select the closest non-faulty elevatorto the source without considering the elevator’s congestion,causing them to suffer from high energy and latency costs.To improve the traffic distribution in PC-3DNoCs, [16]proposed an optimized elevator-selection scheme using theTabu search algorithm. However, the offline Tabu optimizationcannot capture the dynamics of the runtime network traffic.Also, the search algorithm ignores the network energy effi-ciency during the elevator selection. In [12], an online elevator-selection scheme called CDA selects the elevator based onthe buffer utilization of the routers between a source and theelevator. However, CDA requires online global information ofthe network buffer utilization which imposes high latency andhardware overheads to share this information.Considering the aforementioned works, an efficient elevator-selection solution is essential but yet to be addressed forPC-3DNoCs. We take on this challenge by developing a novel,adaptive congestion- and energy-aware elevator-selectionscheme (AdEle). Offline elevator-selection approaches enjoylow overhead while online approaches achieve better networklatency and energy consumption. Accordingly, AdEle com-bines the benefits of both approaches while also consideringenergy consumption. On top of being energy-aware, AdEleincludes elevator redundancy and online policies to accommo-date dynamic traffic behavior. We will show that using a setof elevators instead of one elevator for each router can greatlyimprove network performance. Also, our proposed approachonly utilizes local information of routers to effectively manageelevator congestion with low overheads.III. P ROPOSED E LEVATOR -S ELECTION S CHEME : A D E LE This section details our proposed adaptive congestion- andenergy-aware elevator-selection scheme. As shown in Fig. 2,AdEle uses an offline multi-objective simulated-annealing-based algorithm (AMOSA) to find an optimal subset of eleva-tors for each router, and an online elevator-selection algorithm
Elevator ConfigurationTraffic PatternObjectives: Utilization & Distance
Router
Elevator SelectionFor Each Router
Online Elevator Selection (Section ΙΙΙ.C)Offline Optimization (Section ΙΙΙ.B)
Elevator-Set Search (AMOSA)Enhanced Round RobinLow Traffic Override Traffic MonitorElevator Subset (one per router)
Fig. 2. An overview of our proposed elevator-selection scheme: AdEle. to improve elevator selection in the presence of runtime traffic.The following discusses the novel contributions of AdEle.
A. Motivation: Routing in PC-3DNoCs
In PC-3DNoCs, because of the irregular topology, therouting process requires three main steps: 1) selecting anelevator for each packet in the source router and then routingthe packet to that elevator; 2) vertically routing the packetto the destination layer; and 3) routing the packet from theelevator to the destination node. In this routing process, theelevator selection (the first step) is critical as the number ofvertical paths (elevators) is much smaller than the number ofhorizontal paths, putting significantly more traffic pressure onthe elevators.Fig. 1(a) shows an example of a PC-3DNoC with threeelevators ( e – e ) using Elevator-First-based elevator selec-tion [4], [10], [11] (i.e., the closest elevator to the source routeris selected). Routers are colored with the elevator’s color theywould use under the Elevator-First policy: i.e., four routerswill use the green ( e ) elevator, seven will use the blue ( e )elevator, and five will use the red ( e ) elevator. Unfortunately,such an uneven elevator utilization can put severe trafficpressure on certain elevators ( e in this example). Ideally, someof the load on the e elevator could be assigned to the e or e elevators, making the e elevator less congested. Fig. 1(b)demonstrates the utilization of the middle-layer routers withElevator-First selection policy under uniform traffic. As canbe seen, e is highly congested due to the uneven elevatorselection. In terms of energy efficiency, the best elevatorselection is on the minimal path between the source anddestination. However, as can be seen in Fig. 1(a) for the pathbetween S and D, policies like Elevator-First (red-dotted line)do not necessarily choose the minimal path (blue-solid line).AdEle will consider both traffic distribution and energyefficiency to select optimal elevators and evenly distributetraffic loads among the elevators. To the best of our knowledge,AdEle is the first congestion- and energy-aware elevator-selection scheme in PC-3DNoCs that includes elevator re-dundancy and online policies to accommodate dynamic trafficbehavior while relying only on local router information. . Optimal Elevator-Subset for Each Router To find the optimal subset of elevators for each router,AdEle performs an offline optimization to distribute the ex-pected traffic load across all elevators and minimize the aver-age inter-node (source to destination) distance. To do this, wefirst define two optimization objectives: 1) elevator-utilizationvariance to improve the traffic load distribution, and 2) aver-age inter-node distance to minimize the energy consumption.Leveraging these objective functions, we will use a multi-objective simulated-annealing-based algorithm (AMOSA [13])to find the optimal elevator subsets.
1) Objective 1 - Elevator Utilization:
To balance the trafficon the elevators, AdEle attempts to minimize the elevator-utilization variance. As discussed above, it is important toevenly distribute the traffic over elevators to avoid highlycongested elevators. To calculate the utilization variance, letus consider an N -node/router network with a set of elevators E = { e , e , . . . , e E } , where E is the total number of eleva-tors. Moreover, assume that during runtime, each router i canselect its elevator from a subset A i ⊆ E . For simplicity, fornow we assume that each router selects each elevator fromits elevator subset ( A i ) uniformly (e.g., using a round-robinpolicy). Therefore, the utilization of elevator e ( U e ) is: U e = N (cid:88) i =1 | A i | N (cid:88) j =1 f ij · P ije , (1)where f ij is the frequency of traffic between routers i and j ,and P ije denotes whether the routing between routers i and j uses the elevator e ( P = 1 ) or not ( P = 0 ). Leveraging (1),the average traffic over all the elevators ( µ ) is: µ = 1 E E (cid:88) i =1 U i . (2)Using (1) and (2), elevator-utilization variance is: σ = 1 E E (cid:88) i =1 ( U i − µ ) . (3)Minimizing the elevator-utilization variance will result in abetter distribution of traffic load on the elevators and lowernetwork latency.
2) Objective 2 - Average Distance:
To improve networkenergy efficiency, AdEle attempts to minimize the averagedistance. As elevator selection is under consideration here, weonly consider inter-layer traffic here. Therefore, the distancebetween inter-layer nodes i and j over an elevator e can bedefined as: Dij e = (cid:26) , i and j are on the same layer d se + d e + d ed , otherwise , (4)where d se , d e , and d ed are the Manhattan distances betweenthe source and elevator, on the elevator (inter-die), and fromthe elevator to the destination, respectively. Based on (4), theaverage inter-layer-node distance in an L -layer network is: AD = 1 N × ( L − L × N ) N (cid:88) i =1 | A i | | A i | (cid:88) e =1 N (cid:88) j =1 D eij . (5)
3) Multi-Objective Optimization:
We use a multi-objective simulated annealing-based optimization algorithm(AMOSA [13]) to find a set of optimal elevator subsetsfor all the routers in the network ( A = { A , . . . , A N } )while minimizing the objective functions in (3) and (5). AsAMOSA is a multi-objective optimization search, it offers aset of solutions that lie on the Pareto front of the optimizationobjectives (see [13] for more details). AMOSA-basedoptimization in AdEle provides different optimal solutions interms of latency and energy efficiency. From these solutions,a designer can make trade-offs when choosing betweenmore latency-aware or energy-aware solutions (see Fig. 3).Selection of solutions are discussed in detail in Section IV. C. Adaptive Elevator Selection
Here, we discuss how a router i can efficiently select anelevator during runtime from its elevator subset ( A i ) identifiedin the previous subsection. As we are interested in an evendistribution of traffic load over all the elevators to improve traf-fic congestion during runtime, we apply an enhanced round-robin (RR) algorithm to select an elevator. In a conventionalRR approach, elevators would be selected in a sequentialorder without considering the runtime traffic. To account forruntime traffic, we include the probability of skipping ( P Sik ) acongested elevator ( k ) for router i in the RR approach. P Sik isadjusted based on the average latency imposed by the elevator k , i.e., higher latencies seen using elevator k increases theprobability of skipping it in the future. Accordingly, AdElecan adaptively manage dynamic traffic loads and congestion.To find P Sik , let us first define a cost function associatedwith making a selection from an elevator subset. After select-ing an elevator, AdEle estimates the cost of this selection byconsidering the time between when the first flit (the headerflit) and when the last flit (the tail flit) leave the source router.The latency imposed by selecting an elevator e k from a subset A i is: T e k = t tail − t head − l p l p , (6)where t tail and t head denote the time when the tail flit andthe header flit leave the source router, respectively. Also, l p isthe length of the packet. The elevator-selection cost ( C k ) canbe updated using the latency of the last selection defined in(6) and based on: C k ← ( a × T e k ) + ((1 − a ) × C k ) , ≤ a ≤ (7)where a is a coefficient to increase or decrease the impactof the new cost versus the old cost. We have experimentallyfound that a = only local information . With wormholeswitching, any blocking in an elevator can be propagated alongthe path from the elevator to the source router. Therefore,blocking at a source router can be interpreted as blocking inthe elevator. Note that incorporating global-network informa-tion into AdEle would improve the selection policy but beless practical as it will impose high hardware area, energyconsumption, and latency costs. ABLE IS
IMULATION S ETUP
Network size 4 × × × × P M P S1 P S3 P S2 Considering (7), we can define router i ’s relative cost ofselecting elevator k from A i versus other possible elevators: C relik = C k (cid:80) | A i | p =0 C p . (8)Based on the relative cost of a particular elevator selection,the possibility of skipping that elevator in the RR approach is: P Sik = − ξ, if C relik ≥ N ; N × ( C relik − N ) × (1 − ξ ) , if N > C relik ≥ N . , otherwise (9)Here, ξ is considered to allow for exploring new solutionseven under high relative costs ( ξ = ξ , suppose that the P Sik of a selectionis 1 because of high congestion. In this case, the elevator k will not be selected in the RR sequence at all and have nochance to update its elevator-selection cost ( C k ). This wouldkeep P Sik high and prevent the elevator from observing anychanges in its cost. To address such an update failure, ξ allowsevery elevator to be selected with a low probability regardlessof P Sik so the cost function has a chance for updating. Toimprove energy efficiency, when C k is below a threshold for all k (low latency applications) and congestion is not a concern,AdEle will instead choose the elevator along the minimalpath (discussed in Section III.A). Here, we experimentallyfind the threshold that minimizes the latency for each trafficand elevator configuration. Our future work will investigate adynamic threshold management.IV. S IMULATION AND E VALUATION R ESULTS
Here, we compare AdEle against two well-known elevator-selection approaches: Elevator-First [10] and CDA [12]. Thesimulation setup is shown in Table I. In PC-3DNoCs, thenumber and location of elevators is limited by hardwareconstraints [9]. Therefore, AdEle is evaluated using differentelevator-placement patterns to show that its efficacy is indepen-dent of any such patterns. Also, because of performance-areatrade-off in PC-3DNoCs [1], various elevator concentrationsmight be employed. Therefore, here we simulate differentconcentration of elevators to show that AdEle performance isnot limited by elevator concentration. Three elevator patternsare considered for a 4 × × P S – P S ) with differentlevels of elevator concentration. P S and P S are extracted to A v e r a g e D i s t a n ce S S S S S S Optimized Solutions 0.1% of Explored Solutions Selected SolutionsElevator-First
Fig. 3. Elevator-selection solutions found by AMOSA optimization in AdEle.TABLE IIP
ERFORMANCE OF SELECTED SOLUTIONS FROM F IG . 3 Elev. Optimized solutionsFirst S S S S S S (cid:88) Latency ∗
131 463 159 132 90.5 88.1 78Energy ∗ Average Latency (cycles)
Energy/flit ( nj ) (cid:88) Selected have an optimized average distance and P S is based on [4].A large network (8 × ×
4) is simulated to show the scalabilityof AdEle. The pattern for this network ( P M ) is also extractedbased on the average distance optimization.AdEle’s offline optimization (see Section III.B) is imple-mented in Python to extract the elevator subsets for routers.These subsets are added to the AdEle router implemented inAccess Noxim simulator [17]. We considered uniform trafficfor the offline optimization, the most pessimistic assumption(i.e., traffic is not known a priori ), while the network simu-lations are done using different synthetic and real-applicationtraffics. Our analysis will demonstrate that AdEle does notrequire runtime traffic in its offline optimization as its onlineselection policy will adjust to runtime traffic. However, AdElecan use the runtime traffic during elevator-subset selection tooffer further latency and energy improvement. A. AMOSA Elevator-Subset Exploration
As discussed in Section III, AMOSA finds various solutionswith different latency and energy-efficiency. To show thesolution selection process, the optimization for P M is detailedhere. A small sample of AMOSA’s explored solutions isshown in Fig. 3. As AMOSA explores the solution space,it makes its way towards the Pareto front (blue curve) tofind the optimal trade-offs between utilization variance andaverage distance. Given the set of solutions, depending on theimportance of energy efficiency (average distance) and latency(utilization variance), the final solution can be selected. Forbrevity, several of these points spread along the Pareto front areselected for network simulation ( S to S ) where the resultsare summarized in Table II. Considering Table II and Fig. 3,lower utilization variance and lower average distance improvesthe latency and energy consumption, respectively. As we areable to significantly reduce the latency with fairly minimalincreases in energy, we select S for further analysis. Similarly,we select the solution for P S – P S . B. AdEle Performance Under Synthetic Traffic
To compare AdEle with Elevator-First and CDA, we firstevaluate the average latency under uniform and shuffle traf-
Packet injection rate L a t e n c y ( c y c l e s ) ElevFirstCDAAdEle (a) P S -Uniform Packet injection rate L a t e n c y ( c y c l e s ) (b) P S -Uniform Packet injection rate L a t e n c y ( c y c l e s ) (c) P S -Uniform Packet injection rate L a t e n c y ( c y c l e s ) AdEle-RR (d) P M -Uniform Packet injection rate L a t e n c y ( c y c l e s ) ElevFirstCDAAdEle (e) P S -Shuffle Packet injection rate L a t e n c y ( c y c l e s ) (f) P S -Shuffle Packet injection rate L a t e n c y ( c y c l e s ) (g) P S -Shuffle Packet injection rate L a t e n c y ( c y c l e s ) AdEle-RR (h) P M -ShuffleFig. 4. Average latency for Elevator-First, CDA, and AdEle under uniform (a–d) and shuffle (e–h) traffic and with different elevator-placement patterns. ElevFirst CDA AdEle0246 N o r m a li ze d l o a d Fig. 5. Traffic load over routers with elevators (blue, green, and red)normalized to the average load over routers without an elevator (white bar). fic patterns and with different elevator-placement patterns inFig. 4. Across all the traffic and elevator-placement patterns,AdEle achieves the lowest latency and highest saturationthreshold. Note that CDA is able to approach AdEle’s per-formance because it considers global intra-layer traffic. In thiswork, we do not consider the high cost of CDA’s global infor-mation sharing and optimistically assume that the informationis instantaneously received at every router. In reality, CDA willlikely perform much worse with stale information or includesignificant implementation overhead. With a higher elevatordensity (e.g., P S ), the elevator congestion issue is less criticaland intra-layer traffic will be more critical. Similarly, in anetwork with larger horizontal dimensions like P M , intra-layertraffic is more important. Yet, AdEle shows better performanceeven with a high density of elevators and for P M . Recall thatAdEle’s offline optimization step used uniform traffic. Yet, asFigs. 4(e)–(h) show, while the traffic is new for AdEle, it stillachieves the lowest latency because its online selection policycan monitor runtime congestion and select better elevators.If the traffic is known a priori , AdEle can use this trafficinformation during offline optimization to improve elevator-subset selection even further. For P M in Figs. 4(d) and 4(h),we also include the average latency of AdEle with standardRR selection. This demonstrates that AdEle’s proposed on-line skipping policy achieves higher improvements in latencycompared to RR in both uniform and shuffle traffic patterns.To show the main reason for latency improvement whenusing AdEle, the load distribution over routers with elevatorsfor P S is shown in Fig. 5. The white bar shows the averageload over elevator-less routers. The other colored bars show the P S1 P S2 P S3 P M N o r m . e n e r gy ElevFirst CDA AdEle (a) Low injection rate P S1 P S2 P S3 P M (b) High injection rateFig. 6. Energy per flit for Elevator-First (ElevFirst), CDA, and AdElenormalized to ElevFirst, and under different injection rates. load over different elevators. As can be seen, AdEle reducesthe load on the highest utilized elevator (blue elevator). Theenergy consumption for each approach and elevator placementis shown in Fig. 6 for low (1E − −
4) and high (1E − −
3) injection rates based on the saturation point (injectionrate at which latency is 10 × zero-load latency) for eachconfiguration. For low injection rates, AdEle has the lowestenergy consumption because it switches to minimal routingand uses the minimal paths. On the other hand, AdEle incursa small energy overhead (less than 8.2% compared to CDA)under high injection rates to take non-minimal paths andimprove traffic congestion. If less energy overhead is desired,AdEle can use configurations with lower energy (see Table II). C. AdEle Performance under Real-Application Traffic
We extracted the traffic of several SPLASH-2 [18] and PAR-SEC [19] benchmarks using Gem5 [20] for real-applicationsimulations. Because Gem5 is limited to 64 cores, we demon-strate our results for P S – P S . As shown in Fig. 7, AdEleimproves the network latency in nearly all cases. In particular,AdEle has more improvements in applications with highertraffic loads (canneal, fft, radix, and water) as there is moreopportunity to reduce the resulting elevator congestion. Inapplications with lower traffic loads (fluidanimate and lu),AdEle maintains similar performance to the other approachesas there is little contention on the elevators and the latencyis close to zero-load latency. Although P S still shows someimprovements for AdEle, the lower number of elevators (three)results in minimal opportunity for AdEle to redirect traffic and a nn ea l ff t f l u i d . l u r a d i x w a t e r * A vg . N o r m . l a t e n c y ElevFirst CDA AdEle (a) P S ca nn ea l ff t f l u i d . l u r a d i x w a t e r * A vg . N o r m . l a t e n c y ElevFirst CDA AdEle (b) P S ca nn ea l ff t f l u i d . l u r a d i x w a t e r * A vg . N o r m . l a t e n c y ElevFirst CDA AdEle (c) P S P S1 P S2 P S3 N o r m . e n e r gy ElevFirst CDA AdEle (d) Energy: P S – P S Fig. 7. Latency ((a)–(c), per application) and energy ((d), averaged across all applications) for Elevator-First (ElevFirst), CDA, and AdEle normalized toElevFirst under real-application traffic with different elevator-placement patterns. ∗ Avg. in (a)–(c) is the average of all six applications.TABLE IIIA
REA ANALYSIS
Cycles Router area ( µm ) Base (ElevFirst) 1 35550 OverheadCDA ∗ ∗ global information sharing is not included. improve latency. On average, AdEle improves the networklatency by 14.9% (up to 21.4%) compared to CDA and by29.9% (up to 52.1%) compared to Elevator-First under P S – P S . Fig. 7(d) shows, for each elevator-placement pattern( P S – P S ), the average energy over all the applications nor-malized to Elevator-First. AdEle imposes a small overheadbecause it may route packets over non-minimal paths in caseof congestion to improve latency. Compared to CDA, AdElehas on average 10.5%, 10.3% and 10% energy overhead under P S , P S , and P S , respectively. D. Hardware-Area Analysis and Comparison
Routers’ hardware of Elevator-First, AdEle, and CDA areimplemented and analyzed using Cadence Genus in 45 nmtechnology. Here, we consider a 1 GHz clock. The results areshown in Table III. Compared to CDA ∗ , AdEle has a smallerarea overhead. This is because AdEle only requires local trafficinformation while CDA requires a table to save global trafficinformation and find the best path in each router. However,CDA ∗ ’s area overhead is an optimistic assumption here as itdoes not include any overhead related to the actual sharingof information. Therefore, real CDA will likely impose higherarea and latency overheads. Also, AdEle does not affect therouter stages and will scale well with the network size, whileCDA requires an additional cycle (or more for larger networks)to update its tables. V. C ONCLUSION
This paper proposes AdEle, as adaptive congestion- andenergy-aware elevator-selection scheme to address elevatoroverutilization in partially connected 3D NoCs. Employing aset of elevators instead of one elevator for each source router,AdEle monitors the network traffic and provides an onlinepolicy to select the proper elevator while considering runtimetraffic loads. AdEle only requires local router information andis able to improve average latency in various scenarios underboth synthetic and real traffic at the cost of less than 10.5% inenergy consumption. Moreover, AdEle can be easily adjusted to consider faults, which is of great interest in PC-3DNoCs,while considering elevator congestion.R
EFERENCES[1] A. I. Arka et al. , “Making a case for partially connected 3D NoC: NFICversus TSV,”
ACM JETC , vol. 16, no. 4, pp. 1–17, 2020.[2] T. Lu et al. , “TSV-based 3-D ICs: Design methods and tools,”
IEEETCAD , vol. 36, no. 10, pp. 1593–1619, 2017.[3] R. Salamat et al. , “LEAD: An adaptive 3D-NoC routing algorithm withqueuing-theory based analytical verification,”
IEEE TC , vol. 67, no. 8,pp. 1153–1166, 2018.[4] A. Coelho et al. , “FL-RuNS: A high-performance and runtime recon-figurable fault-tolerant routing scheme for partially connected three-dimensional networks on chip,”
IEEE TNANO , vol. 18, pp. 806–818,2019.[5] X. Wang et al. , “HRC: A 3D NoC architecture with genuine supportfor runtime thermal-aware task management,”
IEEE TC , vol. 66, no. 10,pp. 1676–1688, 2017.[6] T. H. Vu et al. , “Fault-tolerant spike routing algorithm and architecturefor three dimensional NoC-based neuromorphic systems,”
IEEE Access ,vol. 7, pp. 90 436–90 452, 2019.[7] F. Wang et al. , “An effective approach of reducing the keep-out-zoneinduced by coaxial through-silicon-via,”
IEEE T-ED , vol. 61, no. 8, pp.2928–2934, 2014.[8] T. Frank et al. , “Resistance increase due to electromigration induceddepletion under TSV,” in
IEEE IRPS , 2011.[9] A. Eghbal et al. , “Analytical fault tolerance assessment and metrics forTSV-based 3D network-on-chip,”
IEEE TC , vol. 64, no. 12, pp. 3591–3604, 2015.[10] F. Dubois et al. , “Elevator-first: A deadlock-free distributed routingalgorithm for vertically partially connected 3D NoCs,”
IEEE TC , vol. 62,no. 3, pp. 609–615, 2011.[11] E. Taheri et al. , “Addressing a new class of reliability threats in 3-Dnetwork-on-chips,”
IEEE TCAD , vol. 39, no. 7, pp. 1358–1371, 2020.[12] Y. Fu et al. , “Congestion-aware dynamic elevator assignment for par-tially connected 3D-NoCs,” in
IEEE ISCAS , 2019.[13] S. Bandyopadhyay et al. , “A simulated annealing-based multiobjectiveoptimization algorithm: AMOSA,”
IEEE Transactions on EvolutionaryComputation , vol. 12, no. 3, pp. 269–283, 2008.[14] J. Lee et al. , “Redelf: An energy-efficient deadlock-free routing for 3DNoCs with partial vertical connections,”
ACM JETC , vol. 12, no. 3, pp.1–22, 2015.[15] R. Salamat et al. , “A resilient routing algorithm with formal reliabilityanalysis for partially connected 3D-NoCs,”
IEEE TC , vol. 65, no. 11,pp. 3265–3279, 2016.[16] S. Foroutan et al. , “Assignment of vertical-links to routers in vertically-partially-connected 3-D NoCs,”
IEEE TCAD , vol. 33, no. 8, pp. 1208–1218, 2014.[17] K.-Y. Jheng et al. , “Traffic-thermal mutual-coupling co-simulation plat-form for three-dimensional network-on-chip,” in
IEEE VLSI-DAT , 2010.[18] S. C. Woo et al. , “The SPLASH-2 programs: Characterization andmethodological considerations,”
ACM SIGARCH Computer ArchitectureNews , vol. 23, no. 2, pp. 24–36, 1995.[19] C. Bienia and K. Li,
Benchmarking modern multiprocessors . PrincetonUniversity Princeton, NJ, 2011.[20] N. Binkert et al. , “The Gem5 simulator,”