Adaptive 3D-IC TSV Fault Tolerance Structure Generation
AAdaptive 3D-IC TSV Fault Tolerance StructureGeneration
Song Chen,
Member, IEEE , Qi Xu,
Student Member, IEEE , Bei Yu,
Member, IEEE
Abstract —In three dimensional integrated circuits (3D-ICs),through silicon via (TSV) is a critical technique in providingvertical connections. However, the yield and reliability is one ofthe key obstacles to adopt the TSV based 3D-ICs technology inindustry. Various fault-tolerance structures using spare TSVs torepair faulty functional TSVs have been proposed in literaturefor yield and reliability enhancement, but a valid structurecannot always be found due to the lack of effective generationmethods for fault-tolerance structures. In this paper, we focuson the problem of adaptive fault-tolerance structure generation.Given the relations between functional TSVs and spare TSVs,we first calculate the maximum number of tolerant faults in eachTSV group. Then we propose an integer linear programming(ILP) based model to construct adaptive fault-tolerance struc-ture with minimal multiplexer delay overhead and hardwarecost. We further develop a speed-up technique through efficientmin-cost-max-flow (MCMF) model. All the proposed method-ologies are embedded in a top-down TSV planning frameworkto form functional TSV groups and generate adaptive fault-tolerance structures. Experimental results show that, comparedwith state-of-the-art, the number of spare TSVs used for faulttolerance can be effectively reduced.
Index Terms —3D-IC, fault-tolerance, TSV planning, TSVyield.
I. I
NTRODUCTION A S device feature sizes continue to rapidly decrease,the interconnect delay is becoming a bottleneck lim-iting IC performance. Three dimensional integrated circuits(3D-ICs) technology involves vertically stacking multipledies connected by through silicon vias (TSVs), providinga promising way to alleviate the interconnect problem andachieve a significant reduction in chip area, wire-length andinterconnect power [1]. Study indicates that the average wire-length of a 3D-IC varies according to the square root ofthe number of layers [2]. Moreover, 3D-ICs also offer thepotential for heterogeneous integration, which is essentialfor More than Moore (MtM) technology [3]. 3D integrationhas already seen commercial applications in the form of 3Dmemory but there are still significant open problems in bothresearch and implementation [4]. In this work, we will focuson the TSV reliability problem.
This work was supported in part by the National Natural Science Foun-dation of China (NSFC) under grant No. 61674133, 61404123 and AnhuiProvincial Natural Science Foundation (1508085MF134, China), and TheResearch Grants Council of Hong Kong SAR (Project No. CUHK24209017).S. Chen and Q. Xu are with Department of Electronic Science andTechnology, University of Science and Technology of China, China (e-mail: [email protected] , [email protected] ).B. Yu is with the Department of Computer Science and Engineer-ing, The Chinese University of Hong Kong, NT, Hong Kong (e-mail: [email protected] ). TSVs may be affected by various reliability issues such asundercut, misalignment, or random open defects [5]. Becausethere exist a large number of TSVs in a chip, these issues inturn lead to low chip yield. For example, [5], [6] reported a60% chip yield for a chip with 20000 TSVs and only 20%yield for 55000 TSVs in IMEC process technology. Sinceyield and reliability is a primary concern in 3D ICs design, arobust fault-tolerance structure is imperative. In general, thereare two types of yield losses in 3D-ICs: the yield loss dueto defects in stacked dies and the yield loss due to defectsoccurred during assembling process [7]. For the former case,it is critical to conduct pre-bond testing to avoid the stackingof defective dies [8]. A number of die/wafer matching andinter-die repair strategies have also been proposed to increasethe stack yield [9]–[12]. For the latter case, adding spareTSVs (referred to as s-TSVs ) to repair fault functional TSVs(referred to as f-TSVs ) is an effective method for enhancingyield.One key problem in TSV fault-tolerance design is the fault-tolerance structure generation, where a number of functionalTSVs and one or several spare TSVs are grouped togetherto provide redundancy. Chen et al . [6] proposed a minimumspanning tree based method to group f-TSVs and form one-fault-tolerance structures. However, the method is difficultto be applied to multiple-fault-tolerance structure generation.Wang et al . [13] presented a regular TSV replacing chainstructure that can repair faulty TSVs based on a realisticclustered defect model. Xu et al . [14] further consideredthe physical information of the TSV groups, and developedan ILP formulation for fault-tolerance structure generation.They model replaceable relations between f-TSVs, so themaximum input-port number of individual multiplexers canbe effectively reduced. However, all previous works [13],[14] are under an assumption that a predetermined numberof s-TSVs are assigned to each TSV group. To ensure that K common s-TSVs can be allocated to each f-TSV group,in each group f-TSV number is usually quite small, whichintroduces a large number of TSV groups. Since the totalnumber of s-TSVs is proportional to the TSV group number,it may cause overuse of s-TSVs.To overcome the above issue, in this paper we proposean adaptive fault-tolerance structure, in which the number oftolerant faults is adaptively determined by the distribution ofthe f-TSVs and their candidate s-TSVs. A set of s-TSVs willbe selected from a large amount of candidates. Our adaptivefault-tolerance structure generation method can achieve min-imal multiplexer delay overhead, as well as minimal numberof required s-TSVs. Key technical contributions of this work a r X i v : . [ c s . A R ] M a r t2nt4nt1 nt3 s s f f f f (a) Signal A Signal B Signal C Signal DSignal A Signal B Signal C Signal D (b)
Signal A Signal B Signal C Signal DSignal A Signal B Signal C Signal D f-TSVs-TSV
MUX (c)
Fig. 1: (a) An example of TSV group with four f-TSVs and two s-TSVs; (b) A fault-tolerance structure with large multiplexerdelay overhead; (c) A regular chain structure.are listed as follows. • We are able to determine the maximum number oftolerant faults, denoted as K , in polynomial time. • We present an integer linear programming formulationin generating the adaptive K -fault tolerance structures. • We further propose an efficient min-cost-max-flow(MCMF) based heuristic method to speed-up the K -faulttolerance structure generation. • All the proposed methodologies are embedded in a top-down TSV planning framework to form f-TSV groupsand generate fault-tolerance structures.Experimental results show that, compared with state-of-the-art, the proposed framework can reduce the number of useds-TSVs and maximum port number of multiplexers.The remainder of this paper is organized as follows. SectionII presents the motivation and gives the problem formulation.The method for determining the maximum number of tolerantfaults is presented in Section III. Section IV and Section Vpresent the proposed ILP formulation and heuristic method.Section VI describes the proposed fault tolerance TSV plan-ning methodology. Section VII provides experimental results,followed by conclusion in Section VIII.II. P
RELIMINARIES
A. Chip Yield and TSV Yield
Consider a 3D IC containing l layers, and the yield of i th layer die is Y die i . The yield for wafer-to-wafer (W2W)stacking Y stack can be roughly modeled as [7]: Y stack = l (cid:89) i =1 ( Y die i ) (1)Therefore, the defects exist in each die will certainly affectthe overall chip yield after stacking.Besides, during bonding, any foreign particle caught be-tween the wafers can lead to peeling, as well as delamination,which dramatically reduces bonding quality and yield [15]. Y Bonding captures the yield loss of the chip due to faults inthe bonding processes. According to the cumulative yield property, the yield of a3D chip Y D − chip can be formulated as follows [7]: Y D − chip = Y stack · l − (cid:89) i =1 ( Y Bonding ( i ) · Y T SV ( i ) ) , (2)where Y Bonding ( i ) is the yield of the i th bonding step, and Y T SV ( i ) is the TSV yield in the i th layer. In our work, wefocus on the yield enhancement of 3D chip in terms of TSVyield Y T SV [13]. The total TSV yield Y T SV is calculated bymultiplying all f-TSV group yield Y gj as follows. Y T SV = N (cid:89) j =1 Y gj , (3)where N is the number of f-TSV groups. In this paper weadopt the algorithm described in [13] for the calculation ofgroup yield Y gj . B. TSV Fault-Tolerance Structure
By inserting the multiplexers (including control circuits)and carefully designing the reconfigurable TSV replacingpaths, we can construct TSV fault-tolerance structures, wherethe s-TSVs can be used to transfer signals in the presence offaulty f-TSVs [5].Given an f-TSV planning result, we know the numberand positions of all f-TSVs. Then we perform a top-downiterative f-TSV partitioning to form f-TSVs groups and allo-cate s-TSVs in the whitespace for each group. The numberand positions of used s-TSVs for each f-TSV group aredetermined simultaneously in the f-TSV partitioning stage.Fig. 1(a) shows an example of a TSV group with four f-TSVs( f · · · f ) and two s-TSVs ( s and s ). Here f · · · f belongto nets nt · · · nt , respectively. The dashed large rectanglesrepresent the bounding boxes of different nets. Without lossof generality, we denote the bounding box of an f-TSV f i asthe bounding box of the net f i belonging to. We say that anf-TSV f i can be replaced by another TSV v , if and only if v is located inside or nearby the bounding box of f i . Note thathere the TSV v can be either f-TSV or s-TSV. For example, t2nt4nt1 nt3 s f f f f nt5 f s s s (a) Signal A Signal C Signal D Signal ESignal B Signal C Signal D Signal E2 3 4 5 1 22Signal A1 Signal B f-TSVs-TSV
MUX
31 2 3 4 5 67 8 9 10 11 a by s s01 yab
MUX a b y s s01 yab c (b) Fig. 2: (a) An example of TSV group with five f-TSVs and four s-TSVs, which cannot be handled by previous works; (b) Theadaptive fault tolerance structure generated by our proposed methodology. f is replaceable by f , f , s , s , since these four TSVs arecovered by the bounding box of f .Given a TSV group with some f-TSVs and K s-TSVs, a K -fault tolerance structure includes K independent directedTSV-replacing paths from each f-TSV to s-TSVs. In thisstructure we can repair at most K faulty f-TSVs throughmultiplexer rerouting. For instance, for the TSV group shownin Fig. 1(a), a 2-fault tolerance structure with two s-TSVs canbe generated as in Fig. 1(b), where each f-TSV is directlyconnected to all s-TSVs. Although the design scheme is verysimple, this structure suffers from large delay overhead dueto large multiplexer input size. Some recent works [13], [14]proposed regular K -fault tolerance structure, as shown inFig. 1(c). Here each f-TSV is regularly connected to tworight side neighbouring TSVs and the rightmost f-TSVs areconnected to s-TSVs. Instead of 4-port multiplexers occupiedin Fig. 1(b), here only 3-port multiplexers and 2-port mul-tiplexers are needed. For each f-TSV, the independent TSV-replacing paths are listed as follows. f : { f → f → s } , { f → f → f → s } . f : { f → f → s } , { f → f → s } . f : { f → s } , { f → f → s } . f : { f → s } , { f → s } .To ensure the existence of fault-tolerance structures in TSVgroups, the previous works (e.g. [13], [14]) form TSV groupsunder two constraints: (1) K fault-tolerance structures useexactly K s-TSVs and (2) an f-TSV in a group can bereplaced by any s-TSV within the group. Fig. 1(a) shows anexample of TSV group having two-fault tolerance structures,where all the f-TSVs, f , f , f , and f , can be replacedby both s and s considering the net bounding boxes.Unfortunately, general cases may violate these constraints.Fig. 2(a) shows a generalized example, where five f-TSVs( f · · · f ) and four s-TSVs ( s · · · s ) are involved. Thereplaceable relations between TSVs are shown in Fig. 3(a).In this TSV group, the constraint (1) is violated since wecannot find two-fault tolerance structures if only two s-TSVsare used. The constraint (2) is also violated even if the groupis partitioned into smaller groups since f have no replaceable s-TSVs. Consequently, the method in [13] cannot generatecost-effective fault-tolerance structures for this TSV group,because f has no candidate s-TSVs. The ILP-based methodin [14] cannot generate fault-tolerance structures for thisTSV group since the number of tolerant faults is unknown.However, the f-TSV group definitely includes a two-faulttolerance structure as shown in Fig. 3(c), where three outof four s-TSVs are used in the fault-tolerance structure. Thepossible TSV replacing paths are as follows. f : { f → s } , { f → f → f → f → s } . f : { f → f → f → s } , { f → f → f → s } . f : { f → s } , { f → f → s } . f : { f → f → s } , { f → s } . f : { f → f → s } , { f → s } .In reality, there is no essential difference between thef-TSVs and s-TSVs. Therefore, the existing TSV testingtechnique can be directly adopted to test the f-TSVs and s-TSVs [16]. And the control signal of multiplexers can be setto determine the direction of signal transfer. As shown inFig. 2(b), the control signal of 2-to-1 and 3-to-1 multiplexerare 1-bit and 2-bit, respectively. When all TSVs are fault-freeor existing faulty s-TSVs, the control signals of each multi-plexer are set to transfer signal through their corresponding f-TSVs. But once an f-TSV is faulty, the reconfigurable routingpaths can be determined by the corresponding control signalof multiplexers. For instance, when f-TSV 1 is faulty, thecontrol signals of multiplexer 6 and 7 are set to 0 and 10,causing s-TSV 1 to reroute the signal A . C. Hardware Cost and Multiplexer Delay Overhead
The hardware cost incurred by the fault-tolerance structurecan be divided into several parts, including the area overheaddue to inserted s-TSVs, related control logic (i.e., MUXes),and re-routing interconnect [13]. And the cost is dominatedby the first two parts [12]. Jiang et al . [17] point out that thearea of control logic is negligible compared with the TSVsize and the TSV manufacturing cost is much larger thanlogic gates. Therefore, in order to reduce the hardware cost, f f f f s s s s (a) f f ' f f 'f f 'f f ' f f 's s ' s s 's s 's s ' (b) f f f f f s s s (c) Fig. 3: (a) The corresponding directed graph G of layout in Fig. 2(a); (b)The corresponding splitting graph G (cid:48) ; (c) 2-faulttolerance structure on graph G .we should reduce the number of s-TSVs used in the fault-tolerance structures.The delay of a multiplexer is increased along with thenumber of ports. Therefore, a large multiplexer will introducelarge delay overhead. Moreover, the proposed TSV faulttolerance planning is performed in floorplanning stage andwe have no exact timing information. If we minimize themultiplexer delay overhead in this stage, we could alleviatethe timing closure issue in next placement and routing stage.Therefore, in our work, we consider the multiplexer delayoverhead as one of the optimization objectives. D. Problem Formulation
From the example in Fig. 2, we can see that we confrontnew design challenging if not all s-TSVs can be occupied inconstructing K -fault tolerance structure. Given a TSV groupwith m f-TSVs and n s-TSVs, we first construct a directedgraph G ( V, E ) consisting of all TSV replaceable relations.Here vertex set V = V ∪ V , where V = { f i | i = 1 , · · · , m } is the f-TSVs set and V = { s i | i = 1 , · · · , n } is the s-TSVsset. Besides, the edge set E = { ( u, v ) | u ∈ V ∧ v ∈ V ∧ u can be replaced by v } . Given the TSV group in Fig. 2(a),the corresponding replaceable relation graph is shown inFig. 3(a).We define the problem of TSV fault-tolerance structuregeneration as follows.
Problem 1.
Given a TSV group with m f-TSVs and n s-TSVs, and the directed graph G ( V, E ) , we search for themaximum number of tolerant faults K . Then we generate a K -fault tolerance structure, which includes K independentTSV replacing paths (vertex-disjoint) for each f-TSVs, tominimize both the multiplexer delay overhead and the numberof used s-TSVs. Notice that the yield of the TSV group is evaluated basedon the allocated s-TSVs and the f-TSVs. With the yieldsof the TSV groups, the total TSV yield can be calculatedas discussed in Section II-A. If the target TSV yield is notsatisfied, a TSV group will be selected and partitioned intotwo smaller new TSV groups, where the above
TSV fault-tolerance structure generation problem will be solved again.New TSV groups will be iteratively generated until the targetchip yield is satisfied. III. M AX F LOW BASED M ETHODOLOGY
Given a TSV group with replaceable relation graph G , wesay the TSV group has a K -fault tolerance structure if eachf-TSV f ∈ V has K paths to s-TSV vertices in G . Besides,for each f-TSV f , the paths are vertex-disjoint except the f itself. In this section, we develop a polynomial time algorithmto determine the K value in a TSV group. Our methodologyis based on the Menger’s theorem as follows. Lemma 1 (Menger’s theorem [18]) . Let G be a directedgraph, and let S and T be distinct vertices in G . Then themaximum number of vertex-disjoint S - T paths is equal to theminimum size of an S - T disconnecting vertex set. Here the S - T disconnecting vertex set represents a vertexset whose removal will cause no paths from any vertex in S to any vertex in T . According to Lemma 1, for each f-TSV f , the number of vertex-disjoint paths N d ( f ) equals to theminimum size of the { f } - V disconnecting vertex set in G .For example, in Fig. 3(a), { f , s } is a minimum { f } - V disconnecting vertex set. Therefore, the number of vertex-disjoint paths, N d ( f ) , equals to 2. Based on above lemma,we reach the following theorem: Theorem 1.
Given the replaceable relation graph, the max-imum number of tolerant faults, K , can be determined inpolynomial time, as follows:K = min f ∈ V { N d ( f ) } . (4)Since vertex-disjoint problem is not easy to model, we per-form vertex splitting on G ( V, E ) so that it can be transformedto an edge-disjoint problem, which can be appropriatelymodelled in a maximum flow problem. Each vertex u ∈ V issplit into two vertices u and u (cid:48) , respectively, corresponding tothe vertex’s input and output, and an extra edge ( u, u (cid:48) ) withzero cost is also added. A new directed graph G (cid:48) ( V (cid:48) , E (cid:48) ) isconstructed as follows. • The vertex set V (cid:48) = V ∪ V (cid:48) ∪ V (cid:48) , where V (cid:48) is the splitvertex set of V and V (cid:48) is the split vertex set of V . • The edge set E (cid:48) = E (cid:48) ∪ E (cid:48) , where E (cid:48) = { ( u, u (cid:48) ) | u ∈ V ∧ u (cid:48) is the corresponding split vertex of u } and E (cid:48) = { ( u (cid:48) , v ) | ( u, v ) ∈ E ( G ) ∧ u (cid:48) is the corresponding splitvertex of u } . If there is a directed edge from u to v in E ( G ) , a corresponding directed edge from u (cid:48) to v isadded in E (cid:48) ( G (cid:48) ) .ABLE I: Notations used in ILP. V , V (cid:48) set of f-TSVs and s-TSVs, set of split f-TSVs and splits-TSVs V , V (cid:48) set of f-TSVs, set of split f-TSVs V , V (cid:48) set of s-TSVs, set of split s-TSVs f i , f (cid:48) i f-TSV in V , split f-TSV in V (cid:48) s j , s (cid:48) j s-TSV in V , split s-TSV in V (cid:48) E (cid:48) set of all edges in graph G (cid:48) E (cid:48) set of all splitting edges in graph G (cid:48) ( f i → f (cid:48) i and s j → s (cid:48) j ) E (cid:48) set of all replaceable edges in graph G (cid:48) ( w, w (cid:48) ) edge in E (cid:48) and w in V s , t split f-TSV in V (cid:48) , split s-TSV in V (cid:48) v ( s,t ) binary variable; if a unit flow (path) exists from s to t then v ( s,t ) = 1 , otherwise v ( s,t ) = 0( v, u ) edge in E (cid:48) x ( s,t ) vu binary variable; if a unit flow (path) from s to t goes throughedge ( v, u ) , then x ( s,t ) vu = 1 , otherwise x ( s,t ) vu = 0 d vu binary variable on edge ( v, u ) ; if a unit flow (path) goesthrough edge ( v, u ) , then d vu = 1 , otherwise d vu = 0 Based on the splitting graph, the maximum number oftolerant faults K can be determined in polynomial timeby solving a max-flow problem [18] for each f-TSV. Forinstance, given the replaceable relation graph G ( V, E ) inFig. 3(a), Fig. 3(b) illustrates the splitting graph G (cid:48) ( V (cid:48) , E (cid:48) ) .The number of edge-disjoint paths for each f-TSV are asfollows, N d ( f ) = 2 , N d ( f ) = 2 , N d ( f ) = 3 , N d ( f ) = 3 and N d ( f ) = 3 . Since f and f have only two edge-disjointpaths, the maximum number of tolerant faults, K , equals to2. The fault-tolerance structure can be generated by finding m × K paths, which begin with each split f-TSV in V (cid:48) andend with split s-TSV in V (cid:48) . In addition, all the paths sharingone same source vertex should be edge-disjoint. In the nexttwo sections, we will propose an ILP based algorithm and amin-cost max-flow based heuristic method to generate the K -fault tolerance structure in minimizing both the used s-TSVnumber and the multiplexer delay overhead.IV. I NTEGER L INEAR P ROGRAMMING F ORMULATION
In this section, we discuss how the K edge-disjoint pathsearch problem can be formulated as an integer programming.For convenience, some notations used in this section are listedin TABLE I.First, an integer programming formulation in [14] is givento generate the fault-tolerance structures with minimizationof the multiplexer delay overhead.To model the delay of each multiplexer, it is of importancecalculating indegree of each vertex u ∈ V . As shown inFig. 3(b), the edge ( f (cid:48) , f ) is on the path from f (cid:48) to s (cid:48) ,as well as the path from f (cid:48) to s (cid:48) . Although the same edge istraversed by two paths, it only increases the indegree of f by one. Meanwhile, there may be several edges directed intosame TSV vertex on the paths. For instance, due to edges ( f (cid:48) , f ) and ( f (cid:48) , f ) , the indegree of f should be increased by two. Given a vertex u ∈ V , its indegree is calculated bythe following equation:indegree ( u ) = (cid:88) v :( v,u ) ∈ E (cid:48) min( (cid:88) s ∈ V (cid:48) ,t ∈ V (cid:48) x ( s,t ) vu , . (5)The starting integer programming formulation of fault-tolerance structure generation problem in [14] is shown inFormula (6). The objective function in Formula (6) is tominimize the maximum indegree of all the vertices. Thenumber of binary variables x ( s,t ) vu is m × n × | E (cid:48) | , where m is the number of f-TSVs, n is the number of s-TSVs, while | E (cid:48) | is the number of edges in split directed graph G (cid:48) . Theconstraint (6a) defines a unit flow from s ∈ V (cid:48) to t ∈ V (cid:48) ,which corresponds a path from s , an f-TSV, to t , an s-TSV.The number of this set of constraints is m × n × | V (cid:48) | . Theconstraint (6b) ensures that a set of V (cid:48) paths, which have thesame source s ∈ V (cid:48) , are edge-disjoint. The number of thisset of constraints is m × ( m + n ) . min max u ∈ V indegree ( u ) (6)s.t. (cid:88) v :( u,v ) ∈ E (cid:48) x ( s,t ) uv − (cid:88) v :( v,u ) ∈ E (cid:48) x ( s,t ) vu = , if u = s, , if u ∈ V (cid:48) − { s, t } , − , if u = t ; ∀ s ∈ V (cid:48) , t ∈ V (cid:48) , (6a) (cid:88) t ∈ V (cid:48) x ( s,t ) uu (cid:48) ≤ , ∀ s ∈ V (cid:48) , ( u, u (cid:48) ) ∈ E (cid:48) , (6b) x ( s,t ) vu ∈ { , } , ∀ ( v, u ) ∈ E (cid:48) , s ∈ V (cid:48) , t ∈ V (cid:48) . (6c)Though the integer programming method in [14] can gener-ate K fault-tolerance structures using K s-TSVs, the methodcannot be directly applied for the generation of adaptive fault-tolerance structures, where the number of s-TSVs might belarger than K in K fault-tolerance structures. Then a newinteger programming formulation is proposed to generateadaptive fault-tolerance structures in minimizing both theused s-TSV number and the multiplexer delay overhead. Thenumber of s-TSVs used in the structure can be calculated bythe Equation (7).usedstsv = (cid:88) w ∈ V min( (cid:88) s ∈ V (cid:48) ,t ∈ V (cid:48) x ( s,t ) ww (cid:48) , . (7)Based on the above notations, the edge-disjoint path searchproblem can be formulated as the following integer program-ming (8).Compared with the integer programming (6), in constraint(8a) a new binary variable v ( s,t ) is introduced to indicatewhether a unit flow (path) exists from source s ∈ V (cid:48) to sink t ∈ V (cid:48) . Besides, a new constraint (8b) is defined to ensurethat there will be K paths from each source s ∈ V (cid:48) to verticesin V (cid:48) . The number of this set of constraints is m . By this way,Formula (8) can be applied for any K ≤ n and additionallyminimize the number of required s-TSVs in the structure,while Formula (6) can only be applied for the case K = n . in { max u ∈ V indegree ( u ) + usedstsv } (8)s.t. (cid:88) v :( u,v ) ∈ E (cid:48) x ( s,t ) uv − (cid:88) v :( v,u ) ∈ E (cid:48) x ( s,t ) vu = v ( s,t ) , if u = s, , if u ∈ V (cid:48) − { s, t } , − v ( s,t ) , if u = t ; ∀ s ∈ V (cid:48) , t ∈ V (cid:48) , (8a) (cid:88) t ∈ V (cid:48) v ( s,t ) = K, ∀ s ∈ V (cid:48) . (8b) v ( s,t ) ∈ { , } , ∀ s ∈ V (cid:48) , t ∈ V (cid:48) , (8c)(6b) − (6c) . Formula (8) is non-linear due to the min-max-min and min-min operations in the objective function. Through linearizingthe objective function, Formula (8) can be transformed into aninteger linear programming (ILP) Formula (9). For each edge ( v, u ) ∈ E (cid:48) , an extra binary variable d vu and extra constraints(9a)-(9c) are introduced to replace the min operation inFormula (5) and (7). Besides, the extra constraint (9d) ensuresthat the indegrees of all TSVs will not be greater than λ .Another extra constraint (9e) ensures that the number of s-TSVs used in the structure equals to λ . min ( λ + λ ) (9)s.t. d vu ≥ x ( s,t ) vu , ∀ s ∈ V (cid:48) , t ∈ V (cid:48) , ( v, u ) ∈ E (cid:48) , (9a) d vu ≤ (cid:88) s ∈ V (cid:48) ,t ∈ V (cid:48) x ( s,t ) vu , ∀ ( v, u ) ∈ E (cid:48) , (9b) d vu ∈ { , } , ∀ ( v, u ) ∈ E (cid:48) , (9c) (cid:88) v :( v,u ) ∈ E (cid:48) d vu ≤ λ , ∀ u ∈ V , (9d) (cid:88) ( w,w (cid:48) ) ∈ E (cid:48) d ww (cid:48) = λ , ∀ w ∈ V , (9e)(6b) − (6c) , (8a) − (8c) . For instance, as shown in Fig. 3(b), the blue lines presentedge-disjoint paths for each split f-TSV, and the correspond-ing generated 2 fault-tolerance structure is shown in Fig. 2(b).V. H
EURISTIC F RAMEWORK
For large TSV groups, the ILP based method is very timeconsuming. Consequently, in this section, we propose a min-cost-max-flow (MCMF) based heuristic method to solve theedge-disjoint path problem. The basic idea is to deal with thef-TSVs one by one and, for each f-TSV, a min-cost-max-flowalgorithm is used to find K independent paths. The edge costsare defined to keep the input port number of multiplexer andthe number of s-TSVs as small as possible. A. Network graph model
In order to find K ( K ≤ n ) edge-disjoint paths for an f-TSV f ∈ V , we construct a directed graph G s ( V s , E s ) from G (cid:48) by adding an extra sink vertex t and some edges. Thevertex set V s contains two portions, V s = V (cid:48) ∪ { r } , and r isthe sink vertex. The edge set E s = E (cid:48) ∪ { V (cid:48) → r } .When finding edge-disjoint paths for a certain TSV f i ∈ V , the edge capacities are defined as follows: the capacityof the edge from f i to its splitting vertex f (cid:48) i equals to K ;while the capacities of all the other edges are set to . Thecapacity constraints ensure that we can find up to K edge-disjoint paths from f (cid:48) i to s-TSV vertices, which correspondto K independent TSV-replacing chains for the TSV f i .For the splitting edges corresponding to f-TSVs, the edgecosts are defined as zero while the splitting edges of s-TSVsare defined as follows. ec s ( w, w (cid:48) ) = , if ( w, w (cid:48) ) ∈ E (cid:48) , w ∈ V , and w hasbeen used .C K , if ( w, w (cid:48) ) ∈ E (cid:48) , w ∈ V , and w has not been used . (10) C is constant, which represents the costs of introducinga new s-TSV for constructing the fault-tolerance structure.And the edge costs tend to restrict the use of s-TSVs. In theexperiment, we set C to 3 by the experimental results shownin Section VII-A.For the edges in E (cid:48) , which correspond to the replaceablerelations between TSVs, the edge costs are defined as follows. ec s ( u, v ) = , if ( u, v ) ∈ E (cid:48) and ( u, v ) correspondsto a TSV connection C tc [ v ] , if ( u, v ) ∈ E (cid:48) and ( u, v ) does notcorrespond to a TSV connection (11)In the edge cost function (11), tc [ v ] is defined to be thenumber of edges that end at v and have been used as TSVconnections in the generated partial fault-tolerance structure,that is, the edges that have been traversed by edge-disjointpaths of some other f-TSVs. Therefore, tc [ v ] corresponds tothe input port number of the multiplexer in the input side ofthe TSV v .With this edge costs function, firstly, we tend to make fulluse of existing TSV connections to build the edge-disjointpaths for the current f-TSV since it will not increase the inputports of the multiplexers.Secondly, to minimize the maximum size of multiplexers,the costs of the edges that do not correspond to TSV connec-tions are defined as the exponential function of tc [ v ] . B. Algorithmic flow of heuristic
The algorithmic flow of the proposed heuristic is summa-rized in Algorithm 1. Because the quality of solution dependson the order of f-TSVs selected, an iterative post-processingstage is used to improve the generated fault-tolerance struc-tures. In the post-processing stage, we randomly select anf-TSV, and define the edge costs based on the TSV paths ofall the other f-TSVs. Then we re-solve the min-cost-max-flowmodel to find edge-disjoint paths for the selected f-TSV. The f ' f f 'f f 'f f ' f f 's s ' s s 's s 's t ( , )( , ) ( , )( , )( , ) ( , )( , )( , ) ( , )( , )( , )( , )( , ) ( , )( , ) ( , )( , )( , ) ( , )( , ) ( , )( , )( , ) ( , ) s s ' ( , ) ( , ) ( , ) f f 'f f 'f f ' s s 's s ' (a) f f ' f f 'f f 'f f ' f f 's s ' s s 's s ' t ( , )( , ) ( , )( , )(2, ) ( , )( , )( , ) ( , )( , )( , )( , )( , ) ( , )( , ) ( , )( , )( , ) ( , )( , ) ( , )( , )( , ) ( , ) s s ' ( , ) ( , ) ( , ) f f ' f f 'f f 'f f ' s s 's s ' (b) f f ' f f 'f f 'f f ' f f 's s ' s s 's s ' t ( , )( , ) ( , )( , )( , ) ( , )( , )( , ) ( , )( , )( , )( , )( , ) ( , )( , ) ( , )( , )( , ) ( , )( , ) ( , )( , )( , ) ( , ) s s ' ( , ) ( , ) ( , ) f f ' f f 'f f 'f f ' s s 's s ' (c) Fig. 4: Label on edges represents ( capacity , cost ): (a) The min-cost-max-flow network for f-TSV f (cid:48) , where the two edge-disjoint paths for f (cid:48) : { f (cid:48) → s → s (cid:48) } and { f (cid:48) → f → f (cid:48) → f → f (cid:48) → s → s (cid:48) } ; (b) After solving f (cid:48) , the min-cost-max-flownetwork for f-TSV f (cid:48) , where the two edge-disjoint paths for f (cid:48) : { f (cid:48) → f → f (cid:48) → f → f (cid:48) → s → s (cid:48) } and { f (cid:48) → f → f (cid:48) → s → s (cid:48) } ; (c) After solving f (cid:48) and f (cid:48) , the min-cost-max-flow network for f-TSV f (cid:48) , where the two edge-disjoint pathsfor f (cid:48) : { f (cid:48) → s → s (cid:48) } and { f (cid:48) → s → s (cid:48) } . Algorithm 1 Pseudo code of our heuristic methodInput : A directed graph G (cid:48) ( V (cid:48) , E (cid:48) ) , which contains m f-TSVs and n s-TSVs. Output : A repairable structure including m × K paths. for f-TSV f i ← to m do Construct a directed graph G s ( V s , E s ) for f i ; (cid:46) Find K edge-disjoint paths for f i ; Solve the MCMF model for f i ; end for (cid:46) Perturb the repairable structure; while no coverage do Randomly select an f-TSV f i ; Resolve edge-disjoint paths for f i by MCMF; Record the maximum number of TSV connections onall TSVs; end while f f ' f f 'f f 'f f ' f f 's s ' s s 's s ' Fig. 5: The generated 2-fault tolerance structure by solvingedge-disjoint paths for all f-TSVs, where the TSV connectionsare shown in solid edges.procedure is repeated until the multiplexer maximum inputport number keeps unchanged over a predefined threshold iteration number.Fig. 4(a) – Fig. 4(c) illustrate the process of the heuristicmethod. We choose the f-TSV f (cid:48) to start with. The min-cost-max-flow network for f (cid:48) is shown in Fig. 4(a). All thecosts of edges that end at f-TSVs and s-TSVs are initializedat since there are no any other f-TSV paths and for all v , tc [ v ] = 0 . By solving the min-cost-max-flow, edge-disjointpaths, which correspond to two independent TSV replacingchains for f , are obtained and the TSV connections (solidedges) in the partial fault-tolerance structure.With the edge-disjoint paths for f , the flow network isupdated (edge costs and capacities) for f-TSV f (cid:48) and shownin Fig. 4(b). The edges that are on the edge-disjoint paths of f have zero costs. Considering the vertex s , for example, theedge ( f (cid:48) , s ) has zero costs since it has been traversed by theTSV path of f while the edges ( f (cid:48) , s ) and ( f (cid:48) , s ) have acost of because the both edges are not traversed by any TSVpaths of f and tc [ s ] = 1 . A new TSV connection will beintroduced if we use ( f (cid:48) , s ) or ( f (cid:48) , s ) on the edge-disjointpaths for f , which increase the input ports of multiplexer inthe input side of the TSV s . With the updated network, wecan find two edge-disjoint paths from f (cid:48) to s-TSVs by makinguse of the existing TSV connections as many as possible,which potentially reduces the TSV connections on individualTSVs and minimizes the maximum number of the input portsof multiplexers. The bottom part of Fig. 4(b) shows the TSVconnections in the updated partial fault-tolerance structure.Repeating the same process until the min-cost-max-flowmodel is solved for all f-TSVs, we obtain edge-disjointpaths from each split f-TSV vertex in V (cid:48) , f (cid:48) · · · f (cid:48) , to splits-TSV vertices in V (cid:48) , s (cid:48) · · · s (cid:48) , as shown in Fig. 5. Here thesolid edges are TSV connections. etermine the number of tolera nt faults for new groupsAllocate s-TSVs to new groups f-TSV planning resultsGenerate fault-tolerance structure TSV planning solutionCalculate the chip yield Satisfy target yield?Partition the group with smallest yield YN Fig. 6: The flow of the proposed fault tolerance TSV planning.VI. F
AULT T OLERANCE
TSV P
LANNING
In this section, we discuss a top-down fault toleranceTSV planning framework to form f-TSV groups and generateadaptive fault-tolerance structures. The number of f-TSVgroups is greatly reduced as well as the total number of s-TSVs because of adaptive fault-tolerance structures.Given an f-TSV planning result and the floorplan of theblocks, we know the number and positions of all f-TSVs.Then f-TSV groups are firstly formed using a top-downiterative f-TSV partitioning under the yield constraint and,then, the adaptive fault-tolerance structures are generated foreach group. In each iteration of the f-TSV partitioning stage,the group with the smallest yield will be partitioned into twonew f-TSV groups using the min-cut bi-partitioning algorithmand the required s-TSVs are also allocated for evaluatingthe group yield. The iterative f-TSVs partitioning is repeateduntil the target chip yield is satisfied. Therefore, the numberand position of required s-TSVs for each f-TSV group aredetermined simultaneously in the f-TSV partitioning stage.The chip yield is the product of group yield, which dependson the maximum number of tolerant faults ( K ), the numberof TSVs, and the defect probability of TSVs as discussedin Section II-A. We construct the replaceable relation graph G , whose vertex set includes the f-TSVs in the group andthe corresponding candidate s-TSVs, for computing K andallocating s-TSVs. The maximum number of tolerant faults, K , can be determined in polynomial time by solving a max-flow problem on G , as discussed in Section III. The min-cost-max-flow based heuristic in Section V is used to temporarilygenerate an adaptive K -fault tolerance structure, thus thenumber of required s-TSVs are determined.Finally, the ILP based method in Section IV and the min-cost-max-flow (MCMF) based heuristic in Section V can beadopted to generate adaptive fault-tolerance structures withminimization of both the multiplexer delay overhead and thehardware cost. Fig. 6 illustrates the proposed TSV planningframework.In [13], a greedy method is used to partition f-TSVs intogroups and then an ILP formulation is adopted to allocates-TSVs for each group. The generation of fault-tolerancestructure is not considered since they assume regular struc- tures always exist. In [14], the TSV planning frameworkincludes a top-down partitioning followed by a bottom-upiterative merging (clustering) for reducing the number of f-TSV groups. Then, a min-cost-max-flow based method isused to allocate s-TSVs for each group and an ILP modelis adopted to generate fault-tolerance structures. The samenumber of s-TSVs are allocated to all the f-TSV groups in[13], [14] and, for an f-TSV group, the key point is to ensureenough number of candidate s-TSVs that can be shared byall the f-TSVs in the group. As a result, many small f-TSVgroups are formed, which potentially causes an overuse ofs-TSVs.Compared with the above mentioned two works, the pro-posed TSV planning framework includes a similar top-downpartitioning stage, but the allocation of s-TSVs during thepartitioning is very different. That is because adaptive fault-tolerance structures with various number of s-TSVs are builttemporarily by solving a sequence of min-cost max-flowproblem. VII. E XPERIMENTAL R ESULTS
The proposed algorithms have been implemented in C++language and tested on a 12-core 2.0 GHz Linux server with64 GB RAM. The TSV pitch is assumed to be 5 um × um [3]. LEDA [19] is adopted to solve the max-flow and themin-cost-max-flow problems. GLPK [20] is used as the ILPsolver. hMetis [21] is adopted on f-TSVs partitioning. A. Effectiveness and Efficiency of Fault-Tolerance StructureGeneration Method
We generate several TSV replaceable relation graphs G – G by using the proposed TSV planning framework onMCNC and GSRC benchmarks. Each graph contains f-TSVsand the corresponding candidate s-TSVs, which are coveredby at least one of the bounding boxes of the f-TSVs. In orderto compare the proposed ILP model with the ILP methodin [14] on G – G , we adapt the ILP formulation in [14]here. To generate the K -fault tolerance structure on a TSVreplaceable relation graph G , we select K s-TSVs in all n s-TSVs, and unit flow constraints are defined from all f-TSVs tothose chosen K s-TSVs. If the K -fault tolerance structure isstill not achieved after solving all K combinations, we thinkthe ILP method in [14] cannot generate the K -fault tolerancestructure on this TSV replaceable relation graph G .In addition, the previous work in [14] deals with a specialtype of TSV fault-tolerance structure generation. That is, theyare under an assumption that a predetermined number of s-TSVs are assigned to each TSV group, and an f-TSV in agroup should be replaced by any s-TSV within the group. Wealso generate some specific TSV replaceable relation graphs G – G by using the TSV planning methods in [14] onMCNC and GSRC benchmarks. Since the f-TSVs can bereplaced by all n s-TSVs in each graph, the n -fault tolerancestructure always exists.First, we show the effectiveness of the proposed ILP model.TABLE II shows the experimental results, where “ILP”and “Heuristic” denote results of the proposed ILP modelABLE II: Comparison between ILP [14] and our methods for generating adaptive fault-tolerance structure. Graph m n K ILP [14] ILP Heuristic G G
13 4 129 2 3 2 6.85 (0.18%) 603.68 3 2 9.65 (0.25%) 67.80 3 4 16.79 (0.43%) 0.013 G
14 4 101 1 NA NA NA > G
15 5 177 2 NA NA NA > G
18 5 215 2 NA NA NA > G
18 6 199 2 NA NA NA > G
21 7 255 2 NA NA NA > > G
26 13 529 4 NA NA NA > > G G
12 5 155 5 5 5 49.77 (0.25%) 304.91 5 5 49.77 (0.25%) 306.14 6 5 56.26 (0.28%) 0.007 G
14 5 197 5 5 5 10.21 (0.06%) 3435.64 5 5 10.21 (0.06%) 3468.93 5 5 11.84 (0.07%) 0.010 G
16 5 225 5 5 5 108.19 (0.71%) 3519.16 5 5 108.19 (0.71%) 3519.16 7 5 123.18 (0.81%) 0.016 G
18 5 329 5 NA NA NA > > G
23 6 467 6 NA NA NA > > G
24 6 550 6 NA NA NA > > G
25 7 524 7 NA NA NA > > and min-cost-max-flow based heuristic method, respectively.Columns “ m ”, “ n ”, “ K ” list the number off-TSVs, the total number of available s-TSVs, the number ofedges, and the number of maximumly tolerant faults on eachTSV replaceable relation graph. Besides, columns “ K -fault tolerance structure cannot be achieved within thetime limit (3600 s ). As shown in TABLE II, the ILP methodin [14] generates the fault-tolerance structure only on twosmallest graphs. However, the proposed ILP formulation canachieve the fault-tolerance structure on six graphs.Second, we show the efficiency of the proposed heuristicmethod. TABLE II also compares the proposed heuristicmethod with the proposed ILP method. It can be noticedthat, on small graphs G – G and G – G , the fault-tolerance structure generated by ILP has smaller maximumport number of multiplexers and used less s-TSV numbersthan that generated by the heuristic method. Therefore, forsmall TSV replaceable relation graphs, ILP can achieve anoptimal solution, which can be used to verify the accuracy ofthe solution of the heuristic method. But since ILP is an NP-hard problem, its runtime increases dramatically with the sizeof TSV replaceable relation graphs. As shown in TABLE II,the ILP method cannot generate the fault-tolerance structureon large graphs G – G and G – G within the time limit(3600s). Therefore, for large TSV replaceable relation graphs,the ILP based method is very time consuming, which canindirectly demonstrate the efficiency of the proposed heuristicmethod.In addition, the parameter C in edge cost functions (10)and (11) is also set through experimental results. The experi-ment is performed on MCNC and GSRC benchmarks. In theexperiment, if C is set to 4, some edge cost values are out ofbound, which cannot be solved by min-cost-max-flow based TABLE III: Effect of C on s-TSV numbers and maximumport number of multiplexers. Benchmark C = 2 C = 3 ami33
52 4 46 4 ami49
80 8 66 6 n50
108 7 98 7 n100
181 8 169 7 n200
267 7 250 7 n300
395 8 381 6 model. And we also set C to 2 and 3, the number of useds-TSVs and maximum port number of multiplexers variedwith C , which is shown in TABLE III. Columns “ C = 2, C = 3 canachieve a fault tolerance structure with less number of useds-TSVs and smaller maximum port number of multiplexers.Therefore, in the experiment, we set C to 3. B. Comparison with Previous TSV Fault Tolerance PlanningWork
We use simulated annealing-based multi-layer floorplan-ning [22] to generate the block floorplan and the f-TSVplanning method in [14] to generate f-TSV planning resultas the input to the proposed fault-tolerance TSV planningframework. Based on the same f-TSV planning result, we runthe flow in [13], [14], and the proposed heuristic based frame-work, respectively. The experiment is tested on MCNC andGSRC benchmarks, including two MCNC circuits ( ami33 and ami49 ), and four GSRC circuits ( n50 , n100 , n200 and n300 ). We adopt one more industrial 2D design, whichcontains 403266 cells and 448514 nets. hMetis [21] is adoptedto partition the design into several blocks for floorplanning.Based on different block numbers, two benchmark cases, t337 and t469 , are generated. That is, t337 has 337 blocksand 1836 nets, while t469 has 469 blocks and 5479 nets.Since the square has the smallest perimeter among all therectangles with the same area [23], here the shapes of all theABLE IV: Comparisons among [13], [14], and the proposed adaptive fault-tolerance structure (AFTS) under -fault tolerancestructures (target yield = 99.7%, p = 0.001). Bench ≤
3) AFTS (maximum K ) ami33
55 48 16 100% 48 16 4 3 100% 31 2 3 3 100% 46 2 4 4 100% ami49
130 72 24 100% 66 22 5 3 100% 54 2 5 3 99.99% 66 2 6 5 100% n50
386 210 70 99.97% 204 68 7 3 100% 82 5 6 2 99.96% 98 5 7 5 99.98% n100
592 294 98 99.91% 291 97 7 3 99.94% 136 7 6 3 99.91% 169 7 7 6 99.93% n200 n300 t337
640 315 105 99.90% 309 103 4 3 99.91% 158 8 5 3 99.88% 214 6 6 6 99.90% t469
TABLE V: Comparisons among [13], [14], and the proposed adaptive fault-tolerance structure (AFTS) under -fault tolerancestructures (target yield = 99.5%, p = 0.01). Bench ≤
3) AFTS (maximum K ) ami33
54 51 17 100% 51 17 4 3 100% 35 4 3 3 100% 48 4 4 4 100% ami49
130 87 29 99.96% 81 27 5 3 99.96% 62 5 4 3 99.94% 73 5 5 4 99.95% n50
388 231 77 99.89% 222 74 6 3 99.92% 102 8 5 3 99.88% 113 8 7 5 99.90% n100
589 330 110 99.84% 324 108 6 3 99.87% 165 12 5 3 99.84% 194 11 7 6 99.87% n200 n300 t337
637 342 114 99.82% 330 110 4 3 99.82% 184 13 4 3 99.78% 227 12 7 7 99.81% t469
TABLE VI: Comparisons among [6], [13], [14], and the proposed adaptive fault-tolerance structure (AFTS) under -faulttolerance structures (target yield = 99.5%). Bench K =1) ami33
52 16 16 99.99% 16 16 4 99.99% 16 16 3 99.99% 13 2 2 99.99% ami49
124 28 28 99.95% 25 25 5 99.96% 25 25 4 99.96% 22 3 3 99.95% n50
383 74 74 99.84% 68 68 8 99.87% 68 68 4 99.87% 53 8 3 99.84% n100
596 108 108 99.65% 95 95 8 99.68% 95 95 5 99.68% 78 12 4 99.64% n200 n300 t337
639 124 124 99.65% 113 113 8 99.67% 113 113 6 99.67% 91 16 5 99.64% t469 blocks are set to square. The experiment is executed 20 timesindependently for each benchmark.In fault-tolerance structures, the multiplexers are used toreroute signals, and the delay of a multiplexer is increasedalong with the number of input ports. Besides the hardwarecost incurred by the fault-tolerance structure is related to thenumber of s-TSVs. In this experiment, we compare the num-ber of s-TSVs and the maximum port number of multiplexersof [13], [14], and the proposed TSV planning frameworkunder 3-fault tolerance structures. The layer number is setto 3. The target chip yield is set to 99.7% and the TSVdefect probability p is set to 0.001. The yield results inexperiment are accurate to the fourth decimal place. 3 s-TSVsare assigned to each f-TSV group in [13], [14], that is, themaximum number of tolerant faults K equals to 3.TABLE IV lists the statistic results averaged over 20independent experiments. All results listed in table satisfythe target chip yield. Column “ K gives the number of tolerant faults in thatgroup, respectively. Since the generation of fault-tolerancestructure is not considered in [13], the maximum port numberof multiplexers is not listed. As shown in TABLE IV, thenumber of f-TSV groups is greatly reduced in the proposedmethod. Compared with [13] and [14], the proposed faulttolerance TSV planning framework can reduce the number ofused s-TSVs by 32.79% and 31.67% on average, respectively.In addition, in the proposed framework, if the maximum K isused for each group, it will cause larger multiplexers. Becausethe maximum number of tolerant faults ( K ) in adaptive fault-tolerance structures is often much greater than that of [14],which is fixed at . As a result, the maximum port number ofmultiplexers is increased accordingly in the generated fault-tolerance structures.To reduce the size of required multiplexers, we also runhe proposed fault tolerance TSV planning framework with K ≤ , that is, we set K to if the maximum number oftolerant faults K in a group is greater than . As shown inTABLE IV, compared with [14], the proposed fault toleranceTSV planning framework with K ≤ has comparablemaximum port number of multiplexers. But the required s-TSVs are surprisingly reduced by 50% on average under thesame target yield, as shown in TABLE IV.The TSV defect probability p in [12] ranges from 0.001to 0.01. In order to see the impact of p on performance, wealso execute the experiment when p is set to 0.01 under 3-fault tolerance structures. The layer number is set to 3. Thetarget chip yield is set to 99.5%. TABLE V lists the statisticresults averaged over 20 independent experiments. All resultslisted in table satisfy the target chip yield. Based on the samef-TSV planning result, we run the flow in [13], [14], and theproposed heuristic based framework, respectively. Comparedwith [13] and [14], the proposed fault tolerance TSV planningframework can reduce the number of used s-TSVs by 32.24%and 31.01% on average, respectively. In order to reduce thesize of required multiplexers, we also run the proposed faulttolerance TSV planning framework with K ≤ . As shown inTABLE V, compared with [14], the proposed fault toleranceTSV planning framework with K ≤ has comparablemaximum port number of multiplexers. But the required s-TSVs are surprisingly reduced by 46.50% on average underthe same target yield, as shown in TABLE V.Besides, in [6], -fault tolerance structures are generatedusing minimum spanning tree based method. However, it isdifficult to apply the method to the fault-tolerance structureusing more than one spare TSVs. In addition, the delayoverhead introduced by the multiplexers, which are used forrerouting signals in the generated fault-tolerance structures,is not considered. In the worst-case the input port number ofa multiplexer could be the number of f-TSVs in the groupif the tree is a star structure, which introduces large delayoverhead. In this experiment, we consider -fault tolerancestructures case, that is, the maximum number of tolerant faults K equals to 1. Since the chip yield is lower under -faulttolerance structures, the target chip yield is set to 99.5% andthe TSV defect probability p is set to 0.001. And we compare[6], [13], [14], with the proposed heuristic based model under -fault tolerance structures. One s-TSV is assigned to each f-TSV group in [13] and [14]. And we also set K to inthe proposed fault tolerance TSV planning framework, if themaximum number of tolerant faults K in a group is greaterthan . Based on the TSV planning method in [14], we runthe minimum spanning tree method in [6]. Therefore, the s-TSV numbers and chip yield of [6] and [14] are same in theexperiment.TABLE VI lists the statistic results averaged over 20independent experiments. As shown in TABLE VI, comparedwith [6] and [14], the proposed fault tolerance TSV planningframework can reduce the number of s-TSVs and the max-imum port number of multiplexers when generating -faulttolerance structures.Fig. 7 shows the required s-TSV numbers under varioustarget yields, in comparison among [13], [14], and our pro- .
991 0 .
993 0 .
995 0 .
997 0 . Target Yield s - T S V [13] [14] OursFig. 7: The number of required s-TSVs under various targetyields.posed framework. The experiment is performed on n100 benchmark. Each data point in the figure is an average of 20independent experiments. It can be observed that the numberof required s-TSVs increases along with increasing targetyield and is significantly reduced by the proposed frameworkfor all target chip yields.VIII. C ONCLUSION
In this paper, we focus on the generation of adaptiveTSV fault-tolerance structure. An integer linear programming(ILP) based model and an efficient min-cost-max-flow basedheuristic method are proposed to generate the adaptive fault-tolerance structures in minimizing both the multiplexer delayoverhead and the used s-TSV number. In the end, a fault-tolerance TSV planning methodology is also proposed to pro-vide yield awareness in TSV planning. Experimental resultsshow that, compared with state-of-the-art, the proposed faulttolerance TSV planning methodology can effectively reducethe number of s-TSVs used for fault tolerance.Besides, in this work, the proposed TSV fault toleranceplanning is performed in floorplanning stage and we haveno accurate timing information. Therefore, we only use thewirelength to reflect the wire delay in floorplanning stage.In future we plan to evaluate the delay more accurately byexecuting time-consuming routing.A
CKNOWLEDGMENTS
The authors would like to thank the Information ScienceLaboratory Center of USTC for hardware and software ser-vices. R
EFERENCES[1] S. J. Souri, K. Banerjee, A. Mehrotra, and K. C. Saraswat, “Multiple Silayer ICs: Motivation, performance analysis, and design implications,”in
ACM/IEEE Design Automation Conference (DAC) , 2000, pp. 213–220.[2] J. W. Joyner, P. Zarkesh-Ha, and J. D. Meindl, “A global interconnectdesign window for a three-dimensional system-on-a-chip,” in
IEEEInternational Interconnect Technology Conference (IITC)
IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems (TCAD) ,vol. 36, no. 10, pp. 1593–1619, 2017.5] I. Loi, S. Mitra, T. H. Lee, S. Fujita, and L. Benini, “A low-overheadfault tolerance scheme for TSV-based 3D network on chip links,”in
IEEE/ACM International Conference on Computer-Aided Design(ICCAD) , Nov. 2008, pp. 598–602.[6] Y.-G. Chen, W.-Y. Wen, Y. Shi, W.-K. Hon, and S.-C. Chang, “Novelspare TSV deployment for 3-D ICs considering yield and timing con-straints,”
IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems (TCAD) , vol. 34, no. 4, pp. 577–588, 2015.[7] Q. Xu, L. Jiang, H. Li, and B. Eklow, “Yield enhancement for 3D-stacked ICs: Recent advances and challenges,” in
IEEE/ACM Asia andSouth Pacific Design Automation Conference (ASPDAC) , Feb. 2012,pp. 731–737.[8] H.-H. S. Lee and K. Chakrabarty, “Test challenges for 3D integratedcircuits,”
IEEE Design & Test of Computers , vol. 26, no. 5, pp. 26–35,2009.[9] C. Ferri, S. Reda, and R. I. Bahar, “Strategies for improving theparametric yield and profits of 3D ICs,” in
IEEE/ACM InternationalConference on Computer-Aided Design (ICCAD) , Nov. 2007, pp. 220–226.[10] C.-W. Chou, Y.-J. Huang, and J.-F. Li, “Yield-enhancement techniquesfor 3D random access memories,” in
International Symposium on VLSIDesign, Automation, and Test (VLSI-DAT) , Apr. 2010, pp. 104–107.[11] L. Jiang, R. Ye, and Q. Xu, “Yield enhancement for 3D-stacked mem-ory by redundancy sharing across dies,” in
IEEE/ACM InternationalConference on Computer-Aided Design (ICCAD) , Nov. 2010, pp. 230–234.[12] L. Jiang, Q. Xu, and B. Eklow, “On effective TSV repair for 3D-stacked ICs,” in
IEEE/ACM Proceedings Design, Automation and Testin Eurpoe (DATE) , Mar. 2012, pp. 793–798.[13] S. Wang, M. B. Tahoori, and K. Chakrabarty, “Defect clustering-aware spare-TSV allocation for 3D ICs,” in
IEEE/ACM InternationalConference on Computer-Aided Design (ICCAD) , Nov. 2015, pp. 307–314.[14] Q. Xu, S. Chen, X. Xu, and B. Yu, “Clustered fault tolerance TSVplanning for 3D integrated circuits,”
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) , vol. 36, no. 8,pp. 1287–1300, 2017.[15] Y. Chen, D. Niu, Y. Xie, and K. Chakrabarty, “Cost-effective integrationof three-dimensional (3D) ICs emphasizing testing cost analysis,”in
IEEE/ACM International Conference on Computer-Aided Design(ICCAD) , Nov. 2010, pp. 471–476.[16] B. Noia and K. Chakrabarty,
Design-for-Test and Test OptimizationTechniques for TSV-based 3D Stacked ICs . Switzerland: Springer,2014.[17] L. Jiang, Q. Xu, and B. Eklow, “On effective through-silicon via repairfor 3-D stacked ICs,”
IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems (TCAD) , vol. 32, no. 4, pp. 559–571,2013.[18] A. Schrijver,
Combinatorial Optimization: Polyhedra and Efficiency .Berlin: Springer Science & Business Media, 2002, vol. 24.[19] K. Mehlhorn and S. Naher,
LEDA: A Platform for Combinatorial andGeometric Computing . Cambridge University Press, 1999.[20] A. Makhorin, “GLPK (GNU linear programming kit),” 2008.[21] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hyper-graph partitioning: applications in VLSI domain,”
IEEE Transactionson Very Large Scale Integration Systems (TVLSI) , vol. 7, no. 1, pp.69–79, 1999.[22] S. Chen and T. Yoshimura, “Multi-layer floorplanning for stacked ICs:Configuration number and fixed-outline constraints,”
Integration, theVLSI Journal , vol. 43, no. 4, pp. 378–388, 2010.[23] ——, “Fixed-outline floorplanning: Block-position enumeration anda new method for calculating area costs,”