[PDF] Machine Learning Clustering Techniques for Selective Mitigation of Critical Design Features

Abstract

Selective mitigation or selective hardening is an effective technique to obtain a good trade-off between the improvements in the overall reliability of a circuit and the hardware overhead induced by the hardening techniques. Selective mitigation relies on preferentially protecting circuit instances according to their susceptibility and criticality. However, ranking circuit parts in terms of vulnerability usually requires computationally intensive fault-injection simulation campaigns. This paper presents a new methodology which uses machine learning clustering techniques to group flip-flops with similar expected contributions to the overall functional failure rate, based on the analysis of a compact set of features combining attributes from static elements and dynamic elements. Fault simulation campaigns can then be executed on a per-group basis, significantly reducing the time and cost of the evaluation. The effectiveness of grouping similar sensitive flip-flops by machine learning clustering algorithms is evaluated on a practical example.Different clustering algorithms are applied and the results are compared to an ideal selective mitigation obtained by exhaustive fault-injection simulation.

Full PDF

MMachine Learning Clustering Techniques forSelective Mitigation of Critical Design Features

Thomas Lange ∗† , Aneesh Balakrishnan ∗‡ , Maximilien Glorieux ∗ , Dan Alexandrescu ∗ , Luca Sterpone †∗ iRoC Technologies , Grenoble, France † Dipartimento di Informatica e Automatica, Politecnico di Torino , Torino, Italy ‡ Department of Computer Systems, Tallinn University of Technology , Tallinn, Estonia { thomas.lange, aneesh.balakrishnan, maximilien.glorieux, dan.alexandrescu } @iroctech.com [email protected] Abstract —Selective mitigation or selective hardening is an ef-fective technique to obtain a good trade-off between the improve-ments in the overall reliability of a circuit and the hardware over-head induced by the hardening techniques. Selective mitigationrelies on preferentially protecting circuit instances according totheir susceptibility and criticality. However, ranking circuit partsin terms of vulnerability usually requires computationally inten-sive fault-injection simulation campaigns. This paper presentsa new methodology which uses machine learning clusteringtechniques to group ﬂip-ﬂops with similar expected contributionsto the overall functional failure rate, based on the analysis of acompact set of features combining attributes from static elementsand dynamic elements. Fault simulation campaigns can then beexecuted on a per-group basis, signiﬁcantly reducing the timeand cost of the evaluation. The effectiveness of grouping similarsensitive ﬂip-ﬂops by machine learning clustering algorithms isevaluated on a practical example.Different clustering algorithmsare applied and the results are compared to an ideal selectivemitigation obtained by exhaustive fault-injection simulation.

Index Terms —Transient Faults, Single-Event Upsets, SelectiveMitigation, Selective Hardening, Soft Error Protection

I. I

NTRODUCTION

The advancement of the process technologies in the lastyears made it possible to manufacture chips with tens ofmillions of ﬂip-ﬂops. At the same time due to the technologyscaling, lower supply voltages and higher operating frequen-cies, circuits became more vulnerable to reliability threats,such as transient faults. Especially, the Soft Errors in ﬂip-ﬂopsare a major concern and countermeasures have to be taken intoconsideration by using hardening techniques, such as TripleModular Redundancy (TMR). However, a fully protected chipmight not meet the system requirements in terms of area,power or target frequency. Since for many applications itis not necessary to decrease the vulnerability to a possibleminimum, Selective Mitigation can be used. Thereby, only themost critical elements of the circuit are protected against SoftErrors and thus, the failure rate of the system is decreased tomeet all requirements [1]–[3].In order to perform Selective Mitigation an exhaustivefailure analysis is required to identify and rank the mostvulnerable elements of the circuit. Especially, the failureanalysis on a functional level grows with the design size, the

This work was supported by the RESCUE project which has receivedfunding from the European Union’s Horizon 2020 research and innovationprogramme under the Marie Sklodowska-Curie grant agreement No. 722325. number of workloads to analyse and the duration in cycles ofeach workload. A detailed functional failure analysis requiresa signiﬁcant investment in terms of human efforts, processingresources and tool licenses. Studies have shown that exhaustivefault simulation is not feasible for today’s complex circuits [4].

A. Objective of This Work

Identifying and ranking the sequential elements which aremost vulnerable to transient faults, usually requires compu-tationally intensive fault-injection simulation campaigns. Thisprocedure can be optimized by grouping ﬂip-ﬂops togetherwhich are expected to have a similar sensitivity to faults. Faultinjection campaigns can then be performed on a per-groupbasis and thus, signiﬁcantly reduce the time and cost of theevaluation [5]. However, this optimization heavily relies onthe effectiveness of the grouping methodology. Therefore, wepropose a new approach to effectively group ﬂip-ﬂops togetherwhich are expected to have a similar sensitivity to functionalfailures. The approach is based on machine learning clusteringtechniques which uses a set of features to characterizes eachﬂip-ﬂop in the circuit. The feature set combines attributesfrom static and dynamic elements. Machine learning clusteringalgorithms are evaluated and compared to an ideal selectivemitigation obtained by exhaustive fault-injection simulation.

B. Organisation of the Paper

In the following sections, ﬁrst, the use of clustering tech-niques for selective mitigation is summarized. Further, thegeneral principle of machine learning clustering techniques areexplained. Section III presents the proposed methodology togroup ﬂip-ﬂops together based on machine learning clusteringand the used feature set. The approach is evaluated on apractical example in Section IV for different machine learningalgorithms. Section V concludes the paper and gives someprospects for future work.II. C

LUSTERING T ECHNIQUES FOR S ELECTIVE M ITIGATION

The approach of protecting only the smallest set of elementsin a circuit to meet a speciﬁed reliability target is calledselective mitigation. Therefore, the individual circuit elementsof a circuit need to be ranked from the most to the leastsensitive. In the case of transient faults in the sequential a r X i v : . [ c s . A R ] A ug ogic which lead to a functional failure this usually requiresexhaustive fault injection simulation, which might not befeasible for large and complex circuits.In order to reduce the mentioned fault injection efforts,fault simulation campaigns can be performed on a groupbasis. Therefore, prior any simulations, ﬂip-ﬂops are groupedtogether and the statistical fault injection is performed on eachof these groups. This can signiﬁcantly reduce the time andcost of the evaluation. However, the accuracy of this coarse-grained fault injection solely relies on an effective approach togroup ﬂip-ﬂops together which have highly similar sensitivityto faults.The previous studied techniques for clustering are based onbuses, design hierarchy, or a hybrid approach using buses, hi-erarchy and signal naming information [5]. These approacheshave several drawbacks. The bus based and hierarchical basedapproach are only able to provide a ﬁxed number of clusters.Thereby, the hierarchical based approach often provides a lownumber of clusters which can be very heterogeneous. The busbased approach often results in a high number of clusterswith small number of ﬂip-ﬂops per cluster and one largercluster containing all the ﬂip-ﬂops which do not belong to anybus. Thus, the reduction of the number of fault injections islimited and the large cluster tends to be heterogeneous whichnegatively impacts the effectiveness of the clustering. Thehybrid approach overcomes the problem of the ﬁxed number ofclusters by combining the bus and hierarchy based approachesand also taking the signal names into account. The approachassumes that ﬂip-ﬂops with a similar naming also have asimilar function and thus, have a similar sensitivity to faults.However, this relies on a consistent naming convention andstrong correlation between the naming and the function withinthe circuit. A. Machine Learning Clustering Techniques

In order to tackle the drawbacks of the previous stud-ied clustering approaches this paper investigates if machinelearning clustering techniques are suitable to group the ﬂip-ﬂops. Clustering techniques in the machine learning domainbelong to the unsupervised learning category. In general thesealgorithm try to group similar objects together based on agiven set of features. The feature set characterizes the objectsand the clustering algorithms group objects together whichhave similar feature values, while objects from a differentgroup should have highly dissimilar feature values [6].In this paper, the ﬂip-ﬂops are characterized by a set offeatures which will be described in the next section. Machinelearning clustering algorithms use this feature set to group ﬂip-ﬂops together. Afterwards, the effectiveness of the groupingwill be evaluated on a practical example and it will be deter-mined if ﬂip-ﬂops with similar fault sensitivity are groupedtogether. III. P

ROPOSED M ACHINE L EARNING C LUSTERING A PPROACH

The proposed methodology is based on clustering tech-niques for selective mitigation. In contrary to previous workthe clustering approach uses machine learning clustering al-gorithms and a feature set to characterize each ﬂip-ﬂop in thecircuit. In this way no assumptions are made about the designand a general methodology is provided.

CircuitRepresentation(RTL Design) TestbenchExtract Featuresper Flip-FlopFlip-FlopFeature SetApplyML ClusteringAlgorithmFlip-FlopGroupsFault InjectionCampaignon Group BasisFunctionalFailure Rateper GroupRank andSelect Groupsto Mitigate N c Fig. 1. Selective Mitigation by Using Machine Learning Clustering

The steps to perform a selective mitigation are illustrated inFig. 1. First, the features for each ﬂip-ﬂop in the design areextracted by using the RTL description and a correspondingtestbench. Second, the machine learning clustering algorithmis applied to the obtained feature set and the ﬂip-ﬂop groupsare obtained. The resulting number of groups N c can be ABLE IF

EATURE S ET TO C HARACTERISE A F LIP -F LOP I NSTANCE FF i Feature Name Description

Structural Related Features i . i within the circuit. i . i within the bus.Bus Length Describes the total length of the bus signal FF i is part of.Bus Label The number/label of the bus signal FF i is part of.Module Label The number/label of the hierarchical module FF i is part of. Signal Activity Related Features @0/@1 The relative time FF i output is at logical / .State Changes The number of state changes. adjusted by the parameters of the machine learning clusteringalgorithm. The number of groups also dictates the effortneeded for the next step: the statistical fault injection. The faultinjection campaign is performed on the computed ﬂip-ﬂopcluster and thus, needs more efforts (in terms of computingresources, human efforts and tool licensees) with a highernumber of cluster and vice versa. Eventually, the sensitivityto faults for each cluster is obtained and they can be rankedfrom the most sensitive to the least sensitive. The selectivemitigation will be applied starting from the most sensitivecluster until the reliability requirement is met. A. The Feature Set

Previous works has shown that the masking effects and thevulnerability of a ﬂip-ﬂop can be related to certain charac-teristics of the circuit, such as circuit structure and signalprobability [7]–[9]. Motivated by this idea a feature set hasbeen developed in [10] which is adapted for the approachpresented in this paper. The feature set characterizes each ﬂip-ﬂop instance in the circuit and combines attributes from staticelements, such as the circuit structure, as well as dynamicelements, such as the signal activity.The original approach was based on the gate-level netlistof a design and contained features which corresponds tosynthesis attributes. In order to obtain a more general approachwhich can already be applied in an early design stage, themethodology presented in this paper uses only features whichcan be derived from the RTL description of the design (e.g.by performing a fast logic elaboration). Further, two additionalfeatures are extracted to reﬂect the bus and hierarchical basedclustering approach described in the previous section. In totalthe feature set consists of 20 features.The features extracted for each ﬂip-ﬂop

F F i are describedin detail in Tab. I. They are divided into two parts, thestructural and the signal activity related features. The structuralrelated features describe a ﬂip-ﬂop in relation with other ﬂip-ﬂops in the circuit without taking the (technology dependent)combinatorial logic into account. To consider the workload ofthe circuit, features are extracted which describe the dynamic behaviour of the ﬂip-ﬂops. Therefore, the information relatedto the signal activity is considered, such as the state distribu-tion and transitions.IV. E VALUATING M ACHINE L EARNING C LUSTERING FOR S ELECTIVE M ITIGATION

In this section the machine learning clustering approach isevaluated on a practical example. Therefore, different clus-tering algorithms are used to group the ﬂip-ﬂops based onthe presented feature set. The effectiveness of the clusteringalgorithms are measured by evaluating the created ﬂip-ﬂopcluster. The goal is to create ﬂip-ﬂop groups which have asimilar vulnerability to critical failures. Therefore, ﬁrst, anexhaustive full ﬂat statistical fault injection campaign wasperformed to obtain the sensitivity to critical failures foreach ﬂip-ﬂop and to provide an independent measure ofthe sensitivity. Afterwards, this data is used to evaluate thedifferent clustering algorithms against an ideal and randomapproach.

A. Circuit Under Test

For the practical example, the Ethernet 10GE MAC Corefrom OpenCores is used. This circuit implements the Me-dia Access Control (MAC) functions as deﬁned in theIEEE 802.3ae standard. The 10GE MAC core has a 10 Gbpsinterface (XGMII TX/RX) to connect it to different types ofEthernet PHYs and one packet interface to transmit and receivepackets to/from the user logic [11]. The circuit consists ofcontrol logic, state machines, FIFOs and memory interfaces.It is implemented at the Register-Transfer Level (RTL) and ispublicly available on OpenCores. The RTL description of thedesign consists of 1054 ﬂip-ﬂops.The corresponding testbench writes several packets to the10GE MAC transmit packet interface. As packet frames be-come available in the transmit FIFO, the MAC calculates aCRC and sends them out to the XGMII transmitter. The XG-MII TX interface is looped-back to the XGMII RX interfacein the testbench. The frames are thus processed by the MACreceive engine and stored in the receive FIFO. Eventually, theestbench reads frames from the packet receive interface andprints out the results [11].

B. Full Flat Statistical Fault Injection Results

In order to evaluate the effectiveness of the clusteringalgorithms the sensitivity of the considered design was mea-sured by performing a ﬂat statistical fault injection campaign.The simulations are performed at the RT-Level using thecorresponding testbench which allows a functional veriﬁcation.Thus, it is possible to evaluate the system-level impact oferrors. The fault injection mechanism consisted of invertingthe value stored in a ﬂip-ﬂop by using a simulator functions.In networking applications, such as the considered design,important data is protected by checksums. This means thata minor payload corruption can be handled by the errorcorrection algorithm. However, in case the fault causes thecircuit to stop working and interrupting the ﬂow of sendingpackages or data is continuously corrupted, then the effect canbe considered as critical. Especially, these failures are highlyproblematic and should be mitigated by selective mitigation.In each ﬂip-ﬂop 200 faults are injected at a random timeduring the active phase of the test-case. The functional failurerate of each ﬂip-ﬂop is calculated by dividing the number ofsimulation runs which lead to a functional failure with thenumber of total simulation runs. The overall results of the ﬂatstatistical fault-injection campaign are presented in Table II.The average critical failure rate is 5.13 % and the most criticalﬂip-ﬂops were identiﬁed and ranked.

TABLE IISEU F

AULT I NJECTION C AMPAIGN R ESULTS

Total Per InjectionInjection Targets (FFs) 1054 -Injected Faults (SEU) 210800 -Functional Failure 10814 5.13 %

C. Machine Learning Clustering for Selective Mitigation

The machine learning clustering algorithms are used togroup ﬂip-ﬂops based on the feature-set described in sec-tion III. The features are extracted from the RTL designof the circuit. Therefore, a fast elaboration is performedand the design is converted into a graph representation. Thestructural features are extracted from the graph by using graphalgorithms and the features regarding the signal activity areextracted by tracing the simulation. The feature extraction isautomated and overall, takes less than 5 minutes.The effectiveness of the clustering is evaluated consideringthey would be used to selectively mitigate against the criticalfailures. Therefore, the data obtained from the exhaustivestatistical fault injection is used to compute the sensitivityto critical faults for each cluster. Then, the reduction of theoverall sensitivity of the circuit was calculated by varyingthe number of groups being protected. For the protection itis assumed that the considered ﬂip-ﬂops within the groupare substituted by hardened cells, TMR or other approaches. It is assumed that after mitigation, the sensitivity to Single-Event Upsets is zero . The ﬂip-ﬂops to protect were selectedstarting with the most sensitive clusters ﬁrst. The resultsare compared against an ideal and a random approach. Inthe ideal approach the most sensitive ﬂip-ﬂops are selectedbased on the exhaustive fault injection campaign. The randomapproach selects ﬂip-ﬂops to protect randomly (averaged over100 independent runs).The clustering algorithms were implemented by usingPython’s scikit-learn Machine Learning framework [12] andapplied to the extracted ﬂip-ﬂop feature set. This processtook only several seconds and is negligible. Each consideredclustering algorithm has different parameters which can beadjusted. They affect the performance of the clustering andalso the resulting number of clusters. For some of the al-gorithms the number of clusters can be speciﬁed directly.Other algorithms try to ﬁnd the optimal number clusters withinthe constrained parameters. In the following evaluation theparameters were chosen in a way that the number of resultingclusters are about 20 %, 10 % and 5 % of the number of ﬂip-ﬂops in the circuit. This would correspond to a reduction ofthe fault injection efforts by × , × and × respectively.

1) K-Means Clustering:

K-Means clustering aims to par-tition the given data points into k clusters. The algorithmcomputes the euclidean distance for each data point in thefeature space. The data samples are then separated in such away the variance within each cluster is equal. The algorithmrequires the number of clusters N c to be speciﬁed.Fig. 2 shows the overall critical sensitivity when the group-ing is performed by the K-Means clustering with differentnumber of clusters N c . The ﬂip-ﬂops to protect were selectedstarting with the most sensitive clusters ﬁrst until all ﬂip-ﬂopsare mitigated. It can be noted that the grouping performs betterwhen the number of clusters is higher.

2) Agglomerative Clustering:

Agglomerative Clusteringperforms a hierarchical clustering by creating nested clusters,which are represented as a tree. The advantage of hierarchicalclustering is that any valid measure of distance can be used,in comparison to K-Means clustering which performs on aneuclidean distance metric. Agglomerative Clustering uses abottom-up approach where each data point starts as its owncluster. Clusters are merged together by following the linkagecriteria. This criteria deﬁnes the metric used for the mergestrategy. It was noted that the best results were obtained byusing the maximum or complete linkage, which minimizes themaximum distance between data points of pairs of clusters anda manhatten distance metric (l1 norm).In Fig 3 the results of the selective mitigation are shownwhen Agglomerative Clustering is used for different numberof clusters N c . Similar to the K-Means clustering the effec- The authors are aware that this is a simpliﬁcation and important aspectsrelated to physical design are not considered. However, the here presentedapproach focuses on the evaluation on the functional level by using the RTLdescription of the circuit. To obtain a more complete analysis, the presentedapproach could be combined with e.g. the classical analysis for electrical andtemporal masking by using post place and route gate-level netlist. % 20% 40% 60% 80% 100%% of Mitigated Flip-Flops0.000.010.020.030.040.05 O v e r a ll C r i t c a l S e n s i t i v i t y IdealRandomK-Means N c =53K-Means N c =105K-Means N c =211 (a) From 0 % to 100 % of mitigated ﬂip-ﬂops

0% 5% 10% 15% 20% 25%% of Mitigated Flip-Flops0.010.020.030.040.05 O v e r a ll C r i t c a l S e n s i t i v i t y IdealRandomK-Means N c =53K-Means N c =105K-Means N c =211 (b) Section from 0 % to 25 % of mitigated ﬂip-ﬂopsFig. 2. K-Means Clustering tiveness of the selective mitigation is increasing with a highernumber of clusters. However, the improvements from 53 to105 and 211 clusters are very minor. When comparing theAgglomerative Clustering to K-Means Clustering it can benoted that Agglomerative Clustering performs generally better.

3) Mean Shift Clustering:

Mean Shift Clustering aims toﬁnd dense areas of data points. The algorithm is based ona sliding-window which is shifted towards regions of higherdensity. The density of the sliding-window is proportional tothe number of data points within the window it. The goal isto locate center points for each group in the dataset. For theprevious clustering algorithm the number of clusters had tobe speciﬁed manually. In Mean Shift clustering the numberof clusters is determined by the algorithm. Further, K-Meansclustering assumes spherical distribution shape of the clustersin the feature space. For the algorithm the window size w needs to be speciﬁed.A general problem when using Mean Shift Clustering is tochose the correct windows size. In this analysis the windowsizes were chosen in a way the resulting number of clustersare close to the number of clusters used for the previousalgorithms. By using window sizes of w = 2 . , w = 1 . and w = 0 . the number of clusters were resulting in 52,

0% 20% 40% 60% 80% 100%% of Mitigated Flip-Flops0.000.010.020.030.040.05 O v e r a ll C r i t c a l S e n s i t i v i t y IdealRandomAgglomerative Clustering N c =53Agglomerative Clustering N c =105Agglomerative Clustering N c =211 (a) From 0 % to 100 % of mitigated ﬂip-ﬂops

0% 5% 10% 15% 20% 25%% of Mitigated Flip-Flops0.010.020.030.040.05 O v e r a ll C r i t c a l S e n s i t i v i t y IdealRandomAgglomerative Clustering N c =53Agglomerative Clustering N c =105Agglomerative Clustering N c =211 (b) Section from 0 % to 25 % of mitigated ﬂip-ﬂopsFig. 3. Agglomerative Clustering

105 and 210, respectively. Fig 4 shows the effectiveness ofthe algorithm when using the obtained clusters for selectivemitigation. As for the previous algorithms the effectiveness isincreasing with a higher number of clusters (smaller windowsizes). It can be seen, that the results with a window size of w = 0 . , which results in 210 groups, is almost as good asthe ideal solution. D. Comparison and Discussion

The results of the considered clustering algorithms withdifferent resulting number of clusters are summarized inTab. III. Since the number of resulting clusters was chosen tobe about the same, the average cluster size is identical or verysimilar. The standard deviation of the cluster sizes howevershows that Agglomerative and Mean Shift clustering tend tocreate cluster with more variety in the cluster size.In order to quantify the effectiveness of the algorithm tocreate ﬂip-ﬂop groups with similar functional failure rate, twometrics were derived from the results: the average varianceof the functional failure rate and the maximum difference ofthe functional failure rate of ﬂip-ﬂops within the same cluster.An additional metrics was created where the average varianceof the functional failure rate within one cluster is weighted

ABLE IIIC

OMPARISON OF THE D IFFERENT C LUSTERING A LGORITHM

ClusteringAlgorithm k -Means 53 19.89 13.01 0.009 0.095 0.92 0.73105 10.04 7.45 0.010 0.041 0.92 0.68211 4.99 3.80 0.003 0.008 0.64 0.51Agglomerative 53 19.89 21.39 0.015 0.094 0.92 0.78Clustering 105 10.04 11.76 0.011 0.040 0.92 0.57211 4.99 5.32 0.003 0.007 0.64 0.54Mean Shift 52 20.27 28.71 0.017 0.167 1 0.60105 10.04 14.12 0.006 0.038 0.92 0.42210 5.02 6.64 0.002 0.005 0.47 0.39

0% 20% 40% 60% 80% 100%% of Mitigated Flip-Flops0.000.010.020.030.040.05 O v e r a ll C r i t c a l S e n s i t i v i t y IdealRandomMean Shift w =2.8Mean Shift w =1.7Mean Shift w =0.85 (a) From 0 % to 100 % of mitigated ﬂip-ﬂops

0% 5% 10% 15% 20% 25%% of Mitigated Flip-Flops0.010.020.030.040.05 O v e r a ll C r i t c a l S e n s i t i v i t y IdealRandomMean Shift w =2.8Mean Shift w =1.7Mean Shift w =0.85 (b) Section from 0 % to 25 % of mitigated ﬂip-ﬂopsFig. 4. Mean Shift Clustering with the cluster size. In this way a large cluster, which has thesame variance within the cluster as a small cluster, is penalizedmore.The results verify what was observed from the Fig. 2, 3 and4. In general the algorithms perform better with a higher num-ber of clusters. Agglomerative Clustering performs slightlybetter than k -Means and Mean Shift clustering performs worstwith low number of clusters and close to ideal with the highest considered number of clusters.In order to evaluate the performance of the clusteringalgorithm without knowledge of the actual/reference values,several metrics exists which quantify certain characteristicsof the created clusters (such as the separation of the data).These metrics can be used when the presented approach isapplied to a new unknown circuit and different algorithmsare evaluated or the algorithm parameters are ﬁne tuned. Forthis analysis the Davies-Bouldin index was used and resultsare shown in Tab. III. This index can be used to evaluatethe separation between the clusters. It signiﬁes the averagesimilarity between clusters by comparing the distance betweenclusters with the size of the clusters themselves. An index of 0is the lowest possible score and values closer to zero indicatea better partition.The Davies-Bouldin index does not fully correlate with thefunctional failure variance or difference metrics. However, itfollows the general direction and a lower index can be ob-served with higher number of clusters. Further, similar scoresare obtained for the k -Means and Agglomerative Clustering asalso seen when comparing the functional failure metrics. Thebest index, the closest to zero, is also obtained for the bestresult, Mean Shift clustering with 210 clusters.V. C ONCLUSION AND F UTURE W ORK

This paper proposes a new methodology to group ﬂip-ﬂops together which are expected to have a similar contri-butions to the overall functional failure rate. The groupingis based on machine learning clustering and uses a compactset of features combining attributes from static elements anddynamic elements. The advantage in comparison to otherexisting approaches is that the approach is more ﬂexible andno assumption of the circuit or its representation is made.Further, the number of clusters can be chosen by the user,which determines the needed efforts for the fault injectioncampaign.The effectiveness of the grouping by different machinelearning clustering algorithms were evaluated on a practicalexample and compared to an ideal solution. Good results wereobtained by choosing the number of clusters with 5 % ofthe total number of ﬂip-ﬂops in the circuit and results closeo the ideal solution were obtained with number of clusterscorresponding to 10 % to 20 % of the number of ﬂip-ﬂops. Thiswould mean that the fault injection efforts could be reducedby a factor of × , × or × respectively.Future work will focus on identifying new features toimprove the effectiveness of the clustering algorithms as wellas applying the techniques to a broader range of circuits.Further, the approach could be extended by using featureswhich take physical design aspects into account and thus,reducing the efforts needed to perform a complete failureanalysis on several levels.R EFERENCES[1] I. Polian, S. M. Reddy, and B. Becker, “Scalable Calculation of LogicalMasking Effects for Selective Hardening Against Soft Errors,” in , Apr. 2008, pp.257–262.[2] M. Maniatakos and Y. Makris, “Workload-driven selective hardening ofcontrol state elements in modern microprocessors,” in , Apr. 2010, pp. 159–164.[3] M. G. Valderas, M. P. Garcia, C. Lopez, and L. Entrena, “ExtensiveSEU Impact Analysis of a PIC Microprocessor for Selective Hardening,”

IEEE Transactions on Nuclear Science , vol. 57, no. 4, pp. 1986–1991,Aug. 2010.[4] Y. Yu, B. Bastien, and B. W. Johnson, “A state of research review onfault injection techniques and a case study,” in

Annual Reliability andMaintainability Symposium, 2005. Proceedings. , Jan. 2005, pp. 386–392.[5] A. Evans, M. Nicolaidis, S. Wen, and T. Asis, “Clustering techniques andstatistical fault injection for selective mitigation of SEUs in ﬂip-ﬂops,”in

International Symposium on Quality Electronic Design (ISQED) , Mar.2013, pp. 727–732.[6] E. Alpaydin and F. Bach,

Introduction to Machine Learning , ser.Adaptive Computation and Machine Learning Series. MIT Press, 2014.[7] I. Wali, B. Deveautour, A. Virazel, A. Bosio, P. Girard, and M. Sonza Re-orda, “A Low-Cost Reliability vs. Cost Trade-Off Methodology to Se-lectively Harden Logic Circuits,”

Journal of Electronic Testing , vol. 33,no. 1, pp. 25–36, Feb. 2017.[8] O. Ruano, J. A. Maestro, and P. Reviriego, “A Methodology forAutomatic Insertion of Selective TMR in Digital Circuits Affected bySEUs,”

IEEE Transactions on Nuclear Science , vol. 56, no. 4, pp. 2091–2102, Aug. 2009.[9] P. K. Samudrala, J. Ramos, and S. Katkoori, “Selective Triple ModularRedundancy (STMR) Based Single-Event Upset (SEU) Tolerant Synthe-sis for FPGAs,”

IEEE Transactions on Nuclear Science , vol. 51, no. 5,pp. 2957–2969, Oct. 2004.[10] T. Lange, A. Balakrishnan, M. Glorieux, D. Alexandrescu, and L. Ster-pone, “Machine Learning to Tackle the Challenges of Transient andSoft Errors in Complex Circuits,” in , Jul.2019, pp. 7–14.[11] Andre Tanguay, “10GE MAC Core Speciﬁcation,” Jan. 2013.[12] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-esnay, “Scikit-learn: Machine Learning in Python,”