3DCAM: A Low Overhead Crosstalk Avoidance Mechanism for TSV-Based 3D ICs
Reza Mirosanlou, Mohammadkazem Taram, Zahra Shirmohammadi, Seyed-Ghassem Miremadi
33DCAM: A L OW O VERHEAD C ROSSTALK A VOIDANCE M ECHANISM FOR
TSV-B
ASED
3D IC S Reza Mirosanlou ∗ University of Waterloo [email protected]
Mohammadkazem Taram
University of California San Diego [email protected]
Zahra Shirmohammadi
Shahid Rajaee Teacher Training University [email protected]
Seyed-Ghassem Miremadi
Sharif University of Technology [email protected] A BSTRACT
Three Dimensional Integrated Circuits (3D IC) offer lower power consumption, higher performance,higher bandwidth, and scalability over the conventional two dimensional ICs.
Through-Silicon Via (TSV) is one of the fabrication mechanisms that connects stacked dies to each other. The large size ofTSVs and the proximity between them lead to undesirable coupling capacitance. This interferencecauses mutual influences between adjacent TSVs and produces crosstalk noise. Furthermore, thiseffect threats the reliability of data during traversal between layers. This paper proposes a mechanismthat efficiently reduces crosstalk noise between TSVs with lower area overhead as compared toprevious works. This mechanism revolves around the fact that retaining TSV value in current statecan reduce coupling in some cases. To evaluate the mechanism, gem5 simulator is used for dataextraction and several benchmarks are taken from the SPEC2006 suite. The simulation results showthat the proposed mechanism reduces crosstalk noise with only 30% imposed TSV overhead whiledelay decreased up to 25.7% as compared to a recent related work. K eywords Interconnection · TSV ·
3D IC ·
3D Integration · Crosstalk · NoC
Technology node scaling in recent decades ushered in gate delay cut-off and rise of interconnection latency [1]. Hence,interconnects have become a major performance bottleneck of high performance
System-on-Chips (SoC) and
IntegratedCircuits (IC) [23]. In addition, interconnections have become more susceptible to noises in particular crosstalk [23].On the other hand, the advent of multi-core processors with ever increasing number of cores has highlighted the needfor fast and reliable interconnections. One of the potential solutions to alleviate the interconnection delay problemis the three dimensional integration using
Through-Silicon Vias (TSV). Vertical integration of IC dies using TSVsoffers high density connections between adjacent dies. This technology also allows stacking of dies with nonidenticaltechnologies such as CMOS with high density DRAM which can be used as a solution to mitigate memory wall problem[19]. Furthermore, the average and maximum distance between interconnect nodes of the 3D stacked ICs are greatlydecreased which leads to significant delay, power, and area improvement. Despite of the TSV advantages, the adjacent,short and bounded TSVs are prone to TSV-to-TSV coupling and crosstalk noise which increases transmission time andpower consumption, and more importantly, it threats the signal integrity [17, 18]. ∗ This work has been done while the authors were at Sharif University of Technology. a r X i v : . [ c s . ET ] J a n CCEPTED TO
ICCD 2015
Victim
TSV TSV t dep t ox Cu SiO Depletion
Region d C1 C2C R Figure 1: Overview of coupling characteristics in TSVs [8]; According to ITRS [15], it is predicted that the height ofTSVs will reach to 20-50 µm and the via diameter will be 2-8 µm till 2018.As demonstrated in Fig. 1, every TSV may be surrounded by neighbour TSVs which cause a big and undesirablecoupling noise. This TSV-to-TSV coupling could be very challenging in 3D ICs due to the fact that TSVs are large andthick, thus the coupling between two adjacent TSVs can be huge. Moreover, the effective coupling capacitance betweenTSVs doubles when the aggressor and the victim signals switch in opposite directions[18].Plenty of crosstalk minimization methods have been proposed in the literature of 2D design (e.g. [5, 11, 12, 14, 26]).However, these methods cannot be directly applied to alleviate TSV-to-TSV crosstalk noise, inasmuch as the TSVs arenot placed in the same planar and are greatly affected by more than two aggressors [27]. Recent efforts in TSV-to-TSVcrosstalk minimization including [4, 17, 27] are complex and impose significant area and TSV overhead. SheildUS [4],by adding a crossbar, remaps data to TSVs in order to shield more active signals by the signals which predicted to haveless transitions in the future. In addition to its complex decision-making circuit, the accuracy of its predictor is underquestion due to the fact that the signals may not have a regular pattern. 3DLAT [27] exploits less adjacent codes to limitmaximum number of transitions in adjacent TSVs. Crosstalk Avoidance Codes (CAC) [17] is another coding schemefor TSV-to-TSV crosstalk minimization. These approaches also need a complex and large coder and also suffer from aconsiderable information redundancy overhead.In this paper, we propose a TSV-to-TSV crosstalk minimization method, named 3DCAM, which can effectively reducecoupling noise between TSVs with a relatively low area and TSV overhead. In addition, the proposed method usesa small simple coder which reduces run-time performance overhead. In the case of a transition on a target signal,considering the target’s neighbours and their coupling effect, 3DCAM decides to whether retain target’s value or sendits original transition. In the condition that coder decides to retain the value it informs the decoder through a controlTSV. The simulation results show that 3DCAM can reduce the transmission delay up to 25.7% as compared to 3DLATmechanism. 3DCAM imposes only 30% TSV overhead which is much less than the 3DLAT TSV overhead (which is80% for ω = 4 ).The rest of this paper is structured as follows. In Section II, related works are reviewed. Section III describes thecrosstalk model for TSVs based 3D ICs on which 3DCAM is built. In Section IV we present 3DCAM crosstalkavoidance mechanism. Section V explains the simulations and results and, finally, Section VI concludes the paper. In the context of
Two Dimensional Network-on-Chips (2D NoC), there are plenty of works that target power con-sumption [9, 10, 20], reliability [20], security [24], or performance [20] of the interconnections. Particularly, crosstalkminimization methods can be classified in three categories: physical level, transistor level and,
Register Transfer Level (RTL) techniques. Wire spacing [1], active and passive shielding [26], and buffer insertion [2] are examples of physicallevel techniques. [14] is a transistor level mechanism which reduces the crosstalk noise by skewing the simultaneousopposite transitions. Although this approach reduces the crosstalk, it requires timing adjustment between senders andreceivers. Furthermore, this approach suffers from run-time management. The general idea behind RTL level techniquesis to omit some undesirable transition patterns by using coding schemes. There are variety of works that focused onanalytical aspect and coding concepts [5, 25]. Error detection codes and error correction codes [12], joint crosstalkavoidance mechanism [25], and CACs [12] are examples of these coding schemes.2
CCEPTED TO
ICCD 2015 I -4 I -3 I -2 I -1 I I I I I C α C β Figure 2: The coupling capacitance crosstalk model for × TSV clusterAlthough the above approaches may cope with crosstalk in 2D ICs, they cannot be directly applied in 3D technologiesbecause the additional dimension makes consequential differences in crosstalk problem analysis. Gathering the longand thick TSVs causes new reliability issues which have been studied recently [16, 22]. Several mechanisms have beenproposed to make 3D ICs more reliable against crosstalk noise, e.g., [4, 6, 7, 17, 27]. The TSV-to-TSV capacitanceand inductance coupling are two major threats to 3D IC reliability. Previous works have concentrated on these effectsfrom two perspectives. [4, 17, 27] proposed capacitance-based mechanisms and [6, 7, 21] proposed inductance-basedtechniques to reduce crosstalk effects in 3D ICs.Increasing TSV distances from each other, shielding TSVs, inserting buffers at the victim side, inserting buffers,decreasing driver size at the aggressor side, and increasing load at the wires are the mechanisms examined in [18] tomitigate TSVs crosstalk noise. According to their experiments, unlike 2D wires, increasing TSV distances is not aneffective solution to TSV-to-TSV coupling problem and the other solutions either need high effort at post-design timeor have negative impact on timing performance.RTL mechanisms in 3D IC have been proposed and experimented recently. [17] proposed a coding scheme that reducesthe maximum crosstalk about 28% based on their proposed crosstalk model. Two other mentionable mechanisms in3D IC against crosstalk noise which have been studied recently are ShieldUS [4] and 3DLAT [27]. ShieldUS usesdata with less transitions as the shield for the more active data. SheildUS tries to minimize average transmission timewith run-time mechanism that remaps data to TSVs in order to banish links with specific crosstalk pattern from others.The TSV overhead of this method is not considerable because it uses the same data as shield. But a large crossbar isrequired for bit shuffling which imposes considerable area overhead. This crossbar will get larger by increasing thenumber of bits to shuffle. Besides, this method needs the data to be highly regular, as this method tries to predict theactivity of the signals.The authors in [27] introduce use of less adjacent transition code along with transition signaling to minimize the numberof transitions. Furthermore, 3DLAT reduces higher crosstalk class frequency. This scheme has a significant TSVoverhead which is not negligible. According to the authors’ report, TSV overhead of 3DLAT is about 80% with ω = 4 .This mechanism imposes more than 160% area overhead with ω = 2 for the same bitwidth. This method also suffersfrom significant area overhead which imposed by its complex coder. In 2D integrated circuits, three neighbor wires affect each other and create coupling capacitance. The effective couplingcapacitance which imposed on the victim (i.e. middle) wire is modeled by Eq. (1) [14]. C eff = C G + C C | ∆ V − ∆ V − V dd | + C C | ∆ V − ∆ V +1 V dd | (1)3 CCEPTED TO
ICCD 2015 × NCluster 1
Cluster 2
Cluster 3
Figure 3: Layout of control TSVs for × N busWhere ∆ V is swing voltage on victim wire, ∆ V − and ∆ V +1 are the voltages that switch on neighbor wires and V dd isthe supply voltage. In addition, C c is coupling capacitance that is imposed between the victim wire and its neighborsand, C G is the coupling capacitance between substrate and plate.Based on the Eq. (1), we can model the transmission delay by the Eq. (2): τ = (1 + ρλ ) π o (2)Where ρ is the coupling coefficient of adjacent wires, λ is the coupling capacitance to substrate capacitance ratio( C C /C G ) and π is the delay of a wire in the ideal channel, i.e., a channel without any coupling effect such ascapacitance and inductance. For instance, when the both aggressors and the victim wire, switch in opposite directions adelay equal to (1 + 4 λ ) π will be imposed to the channel.Similar to the 2D IC crosstalk model, we can drive a delay model for TSVs. Akin to previous studies [4, 17] andbased on TSV’s inherent properties, a square model of 9 adjacent TSVs is considered. Fig. 2 depicts 9 neighbor TSVsfrom top view. We discuss the crosstalk effect on the victim TSV (specified by red in Fig. 2) and we model couplingcapacitance noise that is emanated from its neighbor TSVs. Direct neighbors (i.e. north, south, west, and east) arecloser to the victim, and thus their coupling capacitances are more destructive than the coupling effects of diagonalneighbors (i.e. northeast, northwest, southeast, and southwest). In order to model the effective capacitance on the victimTSV, we can extend the Eq. (1) to Eq. (3): C eff = C G + (cid:88) i = − C α | ∆ V I − ∆ V I i V dd | + (cid:88) i = − C β | ∆ V I − ∆ V I i − V dd | (3)Where C α represents coupling capacitance between a direct aggressor and the victim and C β is coupling capacitancebetween diagonal aggressor and the victim. Similar to 2D crosstalk delay model, based on Eq. (3) we can extend theEq. (2) to model 3D TSVs as follows: τ = (1 + ρ λ + ρ λ ) π o (4)Where ρ is the coupling coefficient of the direct aggressors, λ is the direct coupling capacitance to substratecapacitance ratio ( C α /C G ), ρ is the coupling coefficient of the diagonal aggressors, and λ is the diagonal couplingcapacitance to substrate capacitance ratio ( C β /C G ). ρ and ρ indicate how the changes in aggressors voltages affectthe crosstalk on the victim. For instance, if a direct neighbor switches in opposite direction, ρ would increase by two.In the worst case, all the neighbours switch from V dd to zero and the victim switches from zero to V dd . In this case the4 CCEPTED TO
ICCD 2015Table 1: TSV Crosstalk ClassesClass C e f f T i − ,...,i +4 (t) → T i − ,...,i +4 (t+1)0 C G → C G + C β → C G + 1 . C β → C G + 2 C β → . . .. . .. . . C G + 20 C β → (1 + 8 λ + 8 λ ) π o . C eff = [( C α + C β ) ×
8] + C G = [1 . C β + C β ×
8] + C G = 20 × C β + C G (5)As reported in [4, 27] the λ = 5 . and λ = 3 . . For the sake of simplicity, we assume λ = 1 . λ and consequently C α = 1 . C β , thus as Eq. (5) shows, we can classify crosstalk patterns in 40 distinct classes which represented inTable. 1. Indeed, patterns with higher class numbers have higher crosstalk noise and delay. As mentioned in Section 3, we have classified the patterns into 40 different classes based on their crosstalk effects. In thissection, we propose 3DCAM mechanism which aims to minimize crosstalk effect on signal integrity and performance.To this end, we need to minimize the occurrences of higher crosstalk classes and maximize the occurrences of lowercrosstalk classes. To accomplish which, we have to change the transition patterns on TSVs. As we know about thetransmission line, one of the following conditions can occur to a single wire: → , → , → , or → . Themotivation behind this work is the fact that retaining the previous value of a victim TSV can significantly affect thetransmission’s crosstalk class.Table. 2 presents some examples, in which retaining the victim (middle) TSVs significantly reduces crosstalk class.The first column of this table is the incoming pattern in which the victim TSV could have transition and the secondcolumn shows the same pattern except that the victim’s transition is eliminated. The up and down arrows show zero to V dd and V dd to zero transitions respectively, and dashes(‘-’) denote no transition on TSVs. The changes in crosstalkclasses are represented in third column of this table. For instance, the first row of Table. 2 shows the case that fallsinto crosstalk class of C because the victim has three direct neighbors with opposite directions ( C eff increased by × × . C β ), a direct neighbor without transition ( C eff increased by × × . C β ), two diagonal neighbors withsame direction transitions (impose no crosstalk) and two diagonal neighbors with no transitions ( C eff increased by × × C β ). So the C eff will be C G + 12 . C β and according to Table.1 the pattern falls into 24C class. By similarmanner the crosstalk class after eliminating the victim’s transition would be 12C. As depicts in this table, the crosstalkminimization achieved by this simple modification is considerable. In order to retain victim TSV’s value, 3DCAM should by some means inform the receiver side decoder that the victim’stransition has been eliminated. Thus, 3DCAM reserves some TSVs for this purpose. These control TSVs only switch5
CCEPTED TO
ICCD 2015Table 2: Motivational Example !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Pattern
Pattern * Crosstalk Reduction − ! ↑ ! ↓ ! − ! ↑ ! ↓ ! → − ! ↓ ! ↑ ! − ! − ! ↑ ! − ! ↑ ! ↓ ! − ! ↑ ! ↓ ! ↓ ! ↓ ! − ! ↓ ! ↓ ! − ! → − ! ↑ ! − ! − ! − ! − ! ↑ ! − ! ↓ ! ↑ ! − ! ↓ ! −! ↓! −! −! ↓! −! → ↓! ↑! ↓! ↓! −! ↓!−! ↓! −! −! ↓! −!↓ ! ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! → ↓ ! ↑ ! ↓ ! ↓ ! − ! ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! when the value of their corresponding victim TSVs are not valid (i.e. the 3DCAM eliminates their transitions). Fig. 3demonstrates the layout of these control TSVs for a × N link. The middle TSVs of every × cluster (includingoverlapping ones) have a control TSV through which 3DCAM coder informs decoder that the value of the TSV is validor not. Since the control TSVs may have coupling between themselves, we can repeat the technique and apply 3DCAMon them. Retaining the victim’s value is not always beneficial to crosstalk class. In the cases that the original transitions are goodenough, retaining the victim’s value may result in a worse crosstalk class. In addition, it has negative impact on controlTSVs, considering the fact that retaining a value requires a transition in a control signal. Table. 3 shows an example inwhich retaining the victim’s value leads to a worse crosstalk class and hence a worse transmission delay.Table 3: Disruptive Retaining Example !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Pattern
Pattern * Crosstalk Reduction ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! → ↑ ! ↓ ! ↓ ! ↑ ! − ! ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! ↓ ! To address this issue, we introduce a parameter called
Switch Threshold (ST). This parameter determines which patternsshould be manipulated by 3DCAM. In the other words, 3DCAM retains the value of the victim TSV, only if thetransitions of neighbour TSVs make a pattern which has a crosstalk class more than ST.To find the optimal value for the ST parameter, we swept ST parameter from 0C to 39C and measured averagetransmission delay of several benchmarks. Fig. 6 depicts the results of this experiment. As this figure shows, theaverage delay is minimum when the ST is set to 20. These results are also consistent with the intuition, seeing thatsetting ST to 20 bisects the crosstalk classes.
Fig. 4 and Fig. 5 illustrate the structure of 3DCAM coder and decoder. In order to send 9 bits of data ( D - D ) fromdie X to die Y , the data have to be delivered to the coder which has been placed in die X . After that, the coder evaluatesthe crosstalk class which will be imposed to the victim TSV ( I ). Then it checks whether the crosstalk class is greater6 CCEPTED TO
ICCD 2015 !!!!!!!!!!!! ! -3 I -2 I -1 I I ! I ! I I C D8-D0 3DCAM DECODER D8-D0 I -4 I -3 I -2 I -1 I I ! I ! I I C I -4 Figure 4: 3DCAM coding mechanism !!!!!!!!!!!! ! -3 I -2 I -1 I I ! I ! I I C D8-D0 3DCAM DECODER D8-D0 I -4 I -3 I -2 I -1 I I ! I ! I I C I -4 Figure 5: 3DCAM decoding mechanismthan the ST parameter, in which case, the coder eliminates the victim’s transition and flips the control bit. Next, the dataand control bit are sent to the die Y through TSVs. At the die Y , decoder receives the data and control signals. Thenbased on the control signal, the decoder determines the original value of the victim TSV. It is noteworthy that due tostraightforward functionality of the 3DCAM coder and decoder, they can be implemented by simple circuits. Because of the controlling mechanism, we have to reserve an extra TSV for each cluster including 9 TSVs. As a result,3DCAM suffers from about 30% TSV overhead. Fig. 7 demonstrates the TSV overhead of proposed method, ShieldUS,and 3DLAT ( ω = 4) . As Fig. 7 depicts, the TSV overhead of 3DLAT is about 80% and ShieldUS imposed no TSVoverhead to the circuit because it only shuffles and remaps the data. However, ShieldUS needs a considerable × crossbar which is used to remap data to TSVs. In this section, the proposed mechanism is evaluated and compared with two 3D crosstalk reduction schemes, 3DLAT[27] (with ω = 4 ) and ShieldUS [4] (with interval = 100 ). Based on the crosstalk model that proposed in Section3, we measured the amount of crosstalk reduction and average delay on real traces which are taken from SPEC2006benchmark suite [13]. We used gem5 simulator [3] to capture the transitions of memory data bus of gcc, mcf, namd,soplex, h264, omnetpp, xalancbmk, perlbench2, bwaves, cactusADM, dealII, lbm , and aster benchmarks.Without loss of generality, we assume that the TSVs are arranged in × N layout. Also, we suppose that the databitwidth is 64 and thus we need eight × TSV clusters for the data and three clusters for control lines.
Fig. 8 demonstrates the transmission delay for several benchmarks from SPEC2006 suite. The delays are normalized tothe case that no crosstalk minimization technique is used. As Fig. 8 represents, 3DCAM can reduce the transmissiondelay of benchmarks by 9% compared to the base uncoded case and in the best case 3DCAM could reduce transmissiondelay of soplex benchmark by 25.7% compared to 3DLAT method. Since 3DLAT (with ω = 4 ) tries to code the inputdata in the manner that the coded output has no corsstalk class higher than 23C (based on our model), it can have adestructive effect on the transmissions time with lower crosstalk class. As the most transitions of soplex benchmark7 CCEPTED TO
ICCD 2015 N o r m a li z e A v e r age D e l a y Switch Threshold (ST)
Figure 6: Improvement of 3DCAM for different values of the threshold parameter T SV o v e r head Bitwidth (bit) ω=4)
ShieldUS
Figure 7: percentage of extra TSVs needed for 3DCAM, 3DLAT, and ShieldUS mechanismsfall into lower crosstalk classes, which is less than the average crosstalk class of 3DLAT coded outputs, 3DLATeven increases the transmission delay of this benchmark. ShieldUS also could not effectively reduce the delay ofexperimented benchmarks. Inasmuch as this method can only reduce transmission delay of benchmarks with highlyregular data and signals.
As the occurrence frequency of higher crosstalk classes directly affects the signal integrity and transmission time, thefrequency of crosstalk classes is measured before and after applying the 3DCAM mechanism. Fig. 9 and Fig. 10 showthe occurrence frequency of crosstalk classes before and after applying the 3DCAM mechanism, respectively. As thesefigures show, 3DCAM causes most of crosstalk patterns to fall into the left side of the chart. Namely it pushes thehigher crosstalk classes to the lower crosstalk classes.
As mentioned in Section. 4, 3DCAM uses a simple coder which leads to less area overhead compared to all previousworks. Table 3 demonstrates the coder and decoder area overhead of ShieldUS, 3DLAT, and 3DCAM mechanisms. For × TSV cluster in 3DLAT, the area of coder and decoder is 4264 µm due to its large comparators. We estimate8 CCEPTED TO
ICCD 2015 N o r m a li z ed de l a y Figure 8: Average transmission delay of 12 benchmarks which are normalized to the case that no crosstalk minimizationtechnique is used. gcc 582869 595711 69180 943009 198037 254648 424744 145418 341369 153906 87261 175392 51256 86502 50310 27542 59824 15443 24001 6023 12238 10020 4886 8759 3116 2103 1837 806 1424 258 106 116 30 29 56 7 6 0 27mcf 1650614 1948052 383325 1890199 829485 701196 1508145 288078 891016 506557 296398 721100 162760 263423 233438 150670 194824 65351 98684 65348 53383 48698 27659 31247 35865 13691 17043 3995 7862 4852 3690 985 77 540 37 8 495 0 10316namd 1356721 1562084 264951 1444607 596163 515920 1379845 228348 745245 462018 243288 643152 132347 233806 164944 91655 112700 49649 55563 48691 27400 28735 21072 22170 16671 11148 8763 2514 4603 2153 1213 1408 284 881 816 35 46 0 932soplex 785823 898695 190231 905200 447468 407680 784004 219284 506304 392260 164647 383343 133144 163039 148630 82058 106357 51946 54015 42910 30592 35386 16490 19854 21309 9210 12437 6445 5857 2648 2105 843 399 1245 350 54 14 0 1587h264 387453 499327 95735 389353 234452 159049 339662 124544 198873 141696 74677 202214 48232 65940 67812 30353 60675 21066 32148 25733 13358 15253 7060 8030 4666 3157 3522 1325 1635 475 651 4057 30 163 248 3 5 0 225omnetpp 612726 637430 99012 983630 273726 374727 615304 141625 420494 215531 165271 294111 84292 128148 86192 66936 77206 37835 44680 33721 22128 23371 12367 12694 10332 5744 5942 3856 2397 2259 912 421 83 202 25 57 8 0 78xalancbmk 1487390 1610023 182724 1834295 643199 743468 1457479 340275 971060 552773 320513 648159 188176 346493 217449 121684 173208 83739 110467 78222 55119 52426 34640 38388 23477 12887 15510 5290 4700 3165 1389 1148 310 308 93 41 52 0 29perlbench2 585405 606310 86032 956564 297759 387880 669552 157979 425486 250172 162527 315907 88355 147370 105126 60623 82178 40776 43814 31834 22453 25288 14387 16489 10991 8458 7237 3025 3902 2278 851 1550 376 429 78 92 29 0 349bwaves 52219 61617 8578 65885 23431 26548 46683 9419 33650 18516 9525 25635 5978 10608 8132 5599 5712 2639 3614 2344 1916 1768 979 940 727 459 476 207 238 127 69 99 25 29 9 7 2 0 45cactusADM 1164838 1242421 195592 1652190 621530 566453 1112355 304697 889631 577117 260655 549884 217507 307162 188001 91362 151891 81475 91338 55849 50431 37437 27689 30743 15961 15037 11899 3568 5062 2865 1204 1868 356 311 76 23 10 0 27dealDII 2108950 2301271 234188 2082775 625424 650762 1357568 235527 925943 518865 260736 720682 182171 237096 200486 104542 147847 63050 150248 46651 37991 43681 27262 21724 17886 12261 10089 4137 7255 2173 1601 992 315 438 87 13 354 0 26lbm 1075115 1313864 53296 1379258 358130 623830 543363 617088 908942 277913 428571 995833 129137 520542 379335 206456 237704 453671 298122 16585 235115 343670 737 47784 47134 31585 93569 15630 62471 89 61 31 28 13 14 6 6 0 118astar 1883645 1869247 209247 1871227 878442 596469 1507345 425241 777719 774518 261150 638048 310650 262588 276809 157853 107639 76541 106772 62323 51452 53527 38081 47563 29338 11653 12972 8442 10821 2526 1616 1372 3384 1198 142 16 8 0 104 O cc u rr en c e F r equen cy Crosstalk Class
Figure 9: The occurrence frequency of crosstalk classes before applying 3DCAMShieldUS area overhead with the area of a × crossbar which is 218 µm . Finally, the area of 3DCAM is 116 µm .These mechanisms are implemented and synthesized with 45nm technology using Synopsys Design Compiler.Table 4: Area overhead of different mechanisms coder Mechanism Area ( µm )ShieldUS crossbar (only) 2183DLAT 42643DCAM 116 In this paper, a crosstalk avoidance mechanism for TSV-to-TSV coupling capacitance is presented and a differentcrosstalk model has been discussed. The proposed mechanism decides about sending original data or retaining the TSVvalue in previous state based on a switch threshold (ST) parameter. 3DCAM reduces the frequency of higher crosstalkpatterns which leads to reduce the interconnection delay. As compared to previous work, 3DCAM imposed negligiblearea overhead. GEM5 simulator is used to extract transitions of the data bus. We used real benchmarks taken from9
CCEPTED TO
ICCD 2015 gcc 789729 904652 74149 987023 218282 255793 415252 124744 352047 151431 74856 255924 38471 70036 56497 22344 57628 8424 39511 3762 1899 9825 605 831 581 102 152 58 26 22 26 17 15 1 14 1 10 0 0mcf 2161385 2927343 435335 2066396 933152 760198 1443634 286292 918438 601657 263019 1002345 168323 251113 271528 113289 178436 69124 102393 37154 25081 23629 12189 8127 18769 1802 5577 1186 1804 241 16 454 80 20 167 0 258 0 0namd 1735538 2246195 306692 1578205 654755 543310 1330737 216562 779115 485675 222862 799367 118282 216421 188996 81805 98991 47738 59563 26539 13580 13312 8471 3131 4319 471 2252 99 732 319 47 50 7 510 7 1 94 0 0soplex 1161783 1564021 224429 1064645 542806 445667 775694 214225 560311 409927 144235 553544 100701 152943 177306 65018 89482 49289 65228 23261 13522 16547 4867 4223 4897 680 1660 470 414 180 94 67 64 247 10 0 0 0 1h264 506245 746494 112509 450819 252609 167424 335369 116817 228648 150367 64485 277611 41279 62378 81425 25576 52651 19537 24271 14107 5543 5864 2716 4815 1279 289 458 196 186 65 56 26 1 10 0 1 1 0 0omnetpp 911754 1104513 125018 1124206 317248 395413 626130 138741 469357 215006 145494 400160 71574 129320 101701 64329 74903 34228 45452 16703 11881 10198 4240 4959 3143 730 987 206 153 174 38 33 14 13 5 4 8 0 1xalancbmk 2077017 2691380 266860 2004527 776509 806572 1496637 351566 1071117 570638 280839 933774 164748 336949 283669 108905 145214 70998 91981 38619 26693 22023 9812 7810 5522 2838 1811 883 687 355 152 119 33 20 180 2 3 0 0perlbench2 863428 1106335 124768 1079354 357299 411235 672594 153452 482290 251876 144823 433825 74772 131026 134035 57130 74727 36469 37878 16163 10594 9198 4239 3495 2323 750 1145 239 287 130 49 70 15 25 13 2 6 0 0bwaves 72184 94762 10701 74737 27600 28638 46218 8916 36609 18548 9088 34392 5211 9973 9657 5711 5038 2451 3434 1302 869 936 340 249 237 103 82 17 30 11 11 15 4 1 0 0 2 0 0cactusADM 1753151 2221838 252639 1869052 702955 609350 1146237 280754 975573 589462 222952 777089 195698 264495 249953 99531 135575 73040 90602 32152 17830 20628 5988 7440 3451 1721 1581 394 407 502 77 349 23 17 4 24 1 0 1dealDII 2496815 3201623 312421 2263579 697995 686124 1308461 227164 976097 547474 235890 1045136 153342 222295 236469 97298 132913 51897 108775 27330 17664 19186 13917 4154 3758 1854 1114 319 968 213 199 77 30 19 45 0 1 0 0lbm 2332426 3155392 381400 1808570 1045321 781193 932571 740230 1175162 620975 303353 1592688 237865 349264 442400 128875 143847 656433 188997 203108 47319 109410 31395 236 93677 62295 42 14 19 5 0 3 0 3 0 0 0 0 0astar 2431210 2857561 294278 2083980 957437 626631 1558671 414582 907518 811539 249160 883197 269188 250566 326808 138966 118659 85282 99880 34614 24413 19719 5086 7381 1700 1373 1583 1121 2281 133 151 147 66 146 49 48 49 0 0 O cc u rr en c e F r equen cy Crosstalk Class
Figure 10: The occurrence frequency of crosstalk classes after applying 3DCAMSPEC2006 suite. The simulation results showed that 3DCAM reduces transmission delay up to 25.7% while it reducesTSV overhead by 62.5% compared to 3DLAT mechanism.In our future work, first, we plan to consider a comprehensive crosstalk model based on capacitance and inductancecoupling effects. Since the inductive coupling effect will increase in the near future, it has to be considered in conjunctionwith capacitance coupling effect. Second, presenting a probability model for each crosstalk classes in TSVs is anothertask for authors. Third, developing an analytical reliability model for TSV-to-TSV coupling effect in 3D ICs is going tobe discussed in the future work.
Acknowledgment
The authors would like to thank the anonymous reviewers for their comments which were very helpful in improving thequality and presentation of this paper.
References [1] K. Agarwal, M. Agarwal, D. Sylvester, and D. Blaauw. Statistical interconnect metrics for physical-designoptimization.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 25(7):1273–1288,2006.[2] C. J. Akl and M. a. Bayoumi. Reducing interconnect delay uncertainty via hybrid polarity repeater insertion.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 16(9):1230–1239, 2008.[3] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna,S. Sardashti, et al. The gem5 simulator.
ACM SIGARCH Computer Architecture News , 39(2):1–7, 2011.[4] Y. Y. Chang, Y. S. C. Huang, V. Narayanan, and C. T. King. ShieldUS: A novel design of dynamic shieldingfor eliminating 3D TSV crosstalk coupling noise.
Proceedings of the Asia and South Pacific Design AutomationConference, ASP-DAC , pages 675–680, 2013.[5] C. Duan, V. H. Cordero Calle, and S. P. Khatri. Efficient on-chip crosstalk avoidance CODEC design.
IEEETransactions on Very Large Scale Integration (VLSI) Systems , 17(4):551–560, 2009.[6] A. Eghbal, P. M.Yaghini, N. Bagherzadeh, and M. Khayambashi. Tsv analytical fault tolerance assessment for 3dnetwork-on-chip.
Computers, IEEE Transactions on , PP(99):1–1, 2015.[7] A. Eghbal, P. Yaghini, S. Yazdi, and N. Bagherzadeh. Tsv-to-tsv inductive coupling-aware coding scheme for3d network-on-chip. In
Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2014 IEEEInternational Symposium on , pages 92–97, Oct 2014.10
CCEPTED TO
ICCD 2015[8] A. Engin and S. Narasimhan. Modeling of crosstalk in through silicon vias.
Electromagnetic Compatibility, IEEETransactions on , 55(1):149–158, Feb 2013.[9] H. Farrokhbakht, H. M. Kamali, and S. Hessabi. Smart: A scalable mapping and routing technique for power-gating in noc routers. In
Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip ,NOCS ’17, pages 15:1–15:8, New York, NY, USA, 2017. ACM.[10] H. Farrokhbakht, M. Taram, B. Khaleghi, and S. Hessabi. Toot: an efficient and scalable power-gating method fornoc routers. In , pages 1–8, Aug2016.[11] A. Ganguly, P. Pande, and B. Belzer. Crosstalk-aware channel coding schemes for energy efficient and reliable nocinterconnects.
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , 17(11):1626–1639, Nov 2009.[12] A. Ganguly, P. P. Pande, B. Belzer, and C. Grecu. Design of low power & reliable networks on chip through jointcrosstalk avoidance and multiple error correction coding.
Journal of Electronic Testing , 24(1-3):67–81, 2008.[13] J. L. Henning. Spec cpu2006 benchmark descriptions.
ACM SIGARCH Computer Architecture News , 34(4):1–17,2006.[14] K. Hirose and H. Yasuura. A bus delay reduction technique considering crosstalk. In
Design, Automation and Testin Europe Conference and Exhibition 2000. Proceedings , pages 441–445, 2000.[15] S. Itr. ITRS 2012 Executive Summary. , 2012. Accessed: 2015-03-30.[16] M. Jung, J. Mitra, D. Z. Pan, and S. K. Lim. Tsv stress-aware full-chip mechanical reliability analysis andoptimization for 3d ic.
Commun. ACM , 57(1):107–115, Jan. 2014.[17] R. Kumar and S. P. Khatri. Crosstalk avoidance codes for 3d vlsi. In
Design, Automation Test in Europe ConferenceExhibition (DATE), 2013 , pages 1673–1678, March 2013.[18] C. Liu, T. Song, J. Cho, J. Kim, J. Kim, and S. K. Lim. Full-chip tsv-to-tsv coupling analysis and optimization in3d ic. In
Proceedings of the 48th Design Automation Conference , DAC ’11, pages 783–788, New York, NY, USA,2011. ACM.[19] G. Loh. 3d-stacked memory architectures for multi-core processors. In
Computer Architecture, 2008. ISCA ’08.35th International Symposium on , pages 453–464, June 2008.[20] R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, and Y. Hoskote. Outstanding research problems in noc design:system, microarchitecture, and circuit perspectives.
IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems , 28(1):3–21, 2009.[21] M. Motoyoshi. Through-silicon via (tsv).
Proceedings of the IEEE , 97(1):43–48, Jan 2009.[22] C. Okoro, J. W. Lau, F. Golshany, K. Hummler, and Y. S. Obeng. A detailed failure analysis examination of theeffect of thermal cycling on Cu TSV reliability.
IEEE Transactions on Electron Devices , 61(1):15–22, 2014.[23] P. P. Pande, A. Ganguly, B. Feero, and C. Grecu. Applicability of energy efficient coding methodology toaddress signal integrity in 3D NoC fabrics.
Proceedings - IOLTS 2007 13th IEEE International On-Line TestingSymposium , (Iolts):161–166, 2007.[24] M. J. Sepúlveda, J. Diguet, M. Strum, and G. Gogniat. Noc-based protection for soc time-driven attacks.
IEEEEmbedded Systems Letters , 7(1):7–10, March 2015.[25] S. R. Sridhara and N. R. Shanbhag. Coding for reliable on-chip buses: A class of fundamental bounds and practicalcodes.
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on , 26(5):977–982, 2007.[26] J. Z. J. Zhang and E. Friedman. Effect of shield insertion on reducing crosstalk noise between coupled interconnects. , 2, 2004.[27] Q. Zou, D. Niu, Y. Cao, and Y. Xie. 3dlat: Tsv-based 3d ics crosstalk minimization utilizing less adjacent transitioncode.