Feature Engineering for Scalable Application-Level Post-Silicon Debugging
JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 1
Feature Engineering for Scalable Application-LevelPost-Silicon Debugging
Debjit Pal ∗ , Shobha Vasudevan † Email: { ∗ dpal2, † shobhav } @illinois.edu Abstract —We present systematic and efficient solutions forboth observability enhancement and root-cause diagnosis of post-silicon System-on-Chips (SoCs) validation with diverse usagescenarios. We model specification of interacting flows in typicalapplications for message selection. Our method for messageselection optimizes flow specification coverage and trace bufferutilization . We define the diagnosis problem as identifying buggytraces as outliers and bug-free traces as inliers/normal behaviors ,for which we use unsupervised learning algorithms for outlierdetection. Instead of direct application of machine learningalgorithms over trace data using the signals as raw features,we use feature engineering to transform raw features into moresophisticated features using domain specific operations. Theengineered features are highly relevant to the diagnosis task andare generic to be applied across any hardware designs. We presentdebugging and root cause analysis of subtle post-silicon bugs inindustry-scale OpenSPARC T2 SoC. We achieve a trace bufferutilization of 98.96% with a flow specification coverage of 94.3% (average). Our diagnosis method was able to diagnose up to 66.7% more bugs and took up to 847 × less diagnosis time as comparedto the manual debugging with a diagnosis precision of 0.769. I. I
NTRODUCTION
Post-silicon validation is a crucial component of a modernSoC design validation that is performed under highly ag-gressive schedules and accounts for more than of thevalidation cost [23], [34].An expensive component of the post-silicon validation isapplication level use-case validation through message passing .In this activity, a validator exercises various target usagescenarios of the system ( e.g. , for a smartphone, playing videosor surfing the Web, while receiving a phone call) and monitorsfor failures ( e.g. , hangs, crashes, deadlocks, overflows, etc.).Each usage scenario involves interleaved execution of severalprotocols among IPs in the SoC design. Due to the concurrentexecution of multiple protocols [1], [9], [28], extremely longexecution traces (millions of clock cycles), and lack of bug re-producibility and error sequentiality lead to an extremely timeconsuming post-silicon diagnosis effort. In current industrialpractice [20], post-silicon diagnosis is a manual, unsystematic,and ad hoc process primarily relying on the creativity ofthe validator and often takes weeks to months of validationtime. Consequently, it is crucial to determine techniques tostreamline this activity.In this paper, we present an automated post-silicon debugand diagnosis methodology to shorten diagnosis time usingmachine learning and feature engineering.In previous work [22], we developed a message selectionmethod that specifically targets use-case validation. To debug ause-case scenario, the validator typically needs to observe andcomprehend the messages being sent by the constituent IPs.An effective way to do that is to use hardware tracing [23] where a small set of signals are monitored continuously duringexecution. The effectiveness of hardware tracing is limited bythe signals being selected for tracing. Note that the omissionof a critical signal manifests only during post-silicon debugwhen a silicon respin is infeasible.To select trace signals that are most beneficial for use-casedebugging, we depart from the gate-level post-silicon signalselection approaches [4], [8], [18], [24] of prior art and raisethe design abstraction at which we apply hardware tracing.We systematically model and analyze usage scenarios at theapplication level. Our message selection framework uses pro-tocol formalizations as sequences of transactions or flows [1],[9], [28], [31]. Given a collection of usage scenarios andthe application-level flows they activate (and the constituentmessages), our algorithm computes the messages that are mostbeneficial for debugging. We scale our observability enhancingalgorithms to the industry scale OpenSPARC T2 SoC [29],[30] that is orders of magnitude larger and complex thantraditional ISCAS89 benchmarks used in the literature. Alongwith scale, we argue with empirical evidence that the qualityof selected observable are of higher quality and are highlyeffective for post-silicon usage scenario failure debugging.Although in [22] we automated the message selection,the debug and diagnosis still remains manual and extremelytedious task. The primary objective of the manual post-silicondebug and diagnosis is i) to understand the desired behaviorfrom the specification, ii) to learn the correct message patternsas per the specification, and iii) to learn one or more messagepatterns that are symptomatic of the bug(s). Machine learn-ing [19], [21] (ML) algorithms automatically learn statisticalmodels from large amounts of training data.In this paper, we argue that ML algorithms can learn modelsof the correct and buggy executions of an SoC during the post-silicon debug and diagnosis. To train the models, we can use alarge amount of post-silicon trace data that is generated duringuse-case validation. The primary challenge of applying ML tothe diagnosis problem is in representing the training data suchthat ML model can learn the differences between correct andbuggy behavior, and generalize to any arbitrary design.Logical bugs in designs can be considered as triggering corner-case design behavior; which is infrequent and deviant from normal design behavior. In ML parlance, outlier detec-tion [7], [10] is a technique to identify infrequent and deviant data points, called outliers whereas normal data points arecalled inliers . We apply outlier detection techniques to auto-matically diagnose post-silicon failures by modeling normaldesign behaviors as inliers and buggy design behavior asoutliers. Consequently, the task of learning a buggy designbehavior transforms into a task of modeling the buggy design a r X i v : . [ c s . A R ] F e b OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 2 behavior as an outlier.In post-silicon execution, design bugs typically manifest asone or more patterns of consecutive messages (also knownas message interleaving) in the trace data. Human engineersspend considerable time and effort to identify such patterns inthe trace data. We call such a message pattern as an anomalousmessage sequence . In this work, we apply ML to identify suchanomalous message sequences automatically.The task for the ML algorithm is the outlier detection wherethe ML model is expected to learn normal design behaviorsand buggy design behaviors and classify buggy behaviors asoutliers. The data used for training is post-silicon executiondata where each data point is a triplet consisting of – i) thecycle of occurrence of a message, ii) the IP interface fromwhich the message is sourced, and iii) the message iteself . Inour experiments we found that the raw features are insufficientto capture infrequent and deviant nature of buggy designbehaviors as compared to the normal behavior (c.f., Figure 4).Feature engineering is a critically important task in thesuccessful generalization of ML to a problem domain [5], [12].We engineer domain specific features that are highly relevantto the diagnosis task to control the normal and buggy behaviormodel as seen by the outlier detection algorithms. The engi-neered features are generic, i.e. , they are transformations thatcan be applied to any hardware designs.In our formulation we need the anomalous message se-quences to appear as outliers. We identify infrequency and deviancy as the relevant features that could capture the dis-tinction between normal and anomalous message sequences.Our engineered feature space needs to capture this distinc-tion. We would like normal message sequences to be closeand densely distributed in the feature space and anomalousmessage sequences to be sparsely distributed and distant.Due to the large number of possible message sequences,the computation can become computationally expensive. Tokeep computation tractable, we pre-process trace messagesequences to create message aggregates and characterize eachsuch aggregates for anomaly. A message aggregate with infre-quent message sequences contains more information than [27]a message aggregate with frequent message sequences. Weuse entropy to quantify the information content of an aggre-gate. As the number of infrequent messages sequences in anaggregate increases, the entropy of the aggregate increasesmonotonically. In order to quantify deviancy of a messagesequence with respect to other message sequences in theaggregate, we use a string similarity metric , in particular
Levenshtein distance [13]. As an aggregate contains moreand more deviant message sequences, the average pairwiseLevenshtein distance of the aggregate increases monotonically.We identify message aggregates with both high entropy andhigh Levenshtein distance as outliers and report them ascandidate root causes.The primary benefits of this diagnosis solution are – i) itautomatically learns the normal and buggy design behaviorsfrom trace message data without training, ii) the engineeredfeatures are generic and are independent of any particulardesign and/or application, and iii) the proposed method canshift through a large amount of trace data, thereby improvingdetection of candidate anomalous message sequences that aresymptomatic of design bugs.
Dir 1
InitWait GntW G n t E Done A c k R e q E (a) Instance 1 Instance 21:Init (n)1:Wait (w)1:GntW (c)1:Done (d) R e q E GntE A c k R e q E GntE A c k (b) Fig. 1: 1a shows a flow for an exclusive line access request fora cache coherence flow [31] along with participating IPs. 1bshows two legally indexed instances of cache coherence flow.To show scalability and effectiveness of our automateddiagnosis approach, we perform our experiments on industrialscale OpenSPARC T2 SoC [29], [30]. We inject complex andsubtle bugs, with each bug symptom taking several hundredobserved messages ( up to 457 messages ) and several hundredthousands of clock cycles ( up to 21,290,999 clock cycles )to manifest. Our analysis shows that the proposed diagnosismethod is computationally efficient. It incurred runtime of upto 44.3 seconds and peak memory usage of up to 508.7 MB to pre-process trace messages to create aggregates. To detectoutlier message aggregates, it incurred runtime of up to 18.91seconds and peak memory usage of up to 508.2 MB .We also evaluated effectiveness of our engineered featuresfor outlier detection. We found that each of the candidateanomalous message aggregates has entropy of up to 4.3482 and Levenshtein distance of up to 3.0 . This shows thatour engineered features are highly effective in demarcatinganomalous message aggregates from normal aggregates.Our analysis shows that our proposed diagnosis method ishighly effective. We found that the proposed diagnosis methodwas able to diagnose up to 66.7% more injected bugs with upto 847 × less diagnosis time with a high precision of up to0.769 as compared to manual debugging.Our contributions over [22] are as follows. First, we pose thepost-silicon diagnosis problem as an outlier detection problemand propose a ML-based scalable and efficient technique todiagnose post-silicon use-case failures. Second, we systemati-cally model buggy behavior as an outlier and normal behavioras an inlier in the ML data space. To that extent, we engineeredtwo features that are highly relevant to the diagnosis taskand applicable across hardware designs. Third, we establishwith empirical evidence that our ML-based technique is highlyeffective and can diagnose many more bugs at a fraction oftime with high precision as compared to manual debugging.II. B ACKGROUND AND PRELIMINARIES
Conventions : In SoC designs, a message can be viewed asan assignment of Boolean values to the interface signalsof a hardware IP. In our formalization below, we leave thedefinition of message implicit, but we will treat it as a pair (cid:104)C , w (cid:105) where w ∈ Z + . Informally, C represents the content ofthe message and w represents the number of bits required torepresent C . Given a message m = (cid:104)C , w (cid:105) , we will refer to w as bit-width of m , denoted by width ( m ) or | m | . Definition 1: A flow is a directed acyclic graph (DAG)defined as a tuple, F = (cid:104)S , S , S p , E , δ F , Atom (cid:105) where S is the set of flow states, S ⊆ S is the set of initial states, S p ⊆ S and S p ∩ Atom = ∅ is called the set of stop states, E OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 3 : R e q E : G n t E : A c k : R e q E : G n t E : R e q E : R e q E : R e q E : G n t E : G n t E : R e q E : A c k : G n t E : A c k n1, n2 w1,n2n1,w2 c1, n2w1,w2n1, c2 d1,d2 d1,c2d1,w2d1,n2 c1,w2 Fig. 2: Two instances of cache coherence flow of Figure 1ainterleaved.is a set of messages, δ F ⊆ S × E × S is the transition relationand Atom ⊂ S is the set of atomic states of the flow.We use F . S , F . E etc. to denote the individual componentsof a flow F . A stop state of a flow is its final state after itssuccessful completion. Atom is a mutex set of flow states i.e. ,any two flow states in
Atom cannot happen together. Othercomponents of F are self-explanatory. In Figure 1a, we haveshown a toy cache coherence flow along with the participatingIPs and the messages. In Figure 1a, S = { Init, Wait, GntW,Done } , S = { Init } , S p = { Done } , Atom = { GntW } . Eachof the messages in the cache coherence flow is 1 bit wide,hence E = {(cid:104) ReqE, (cid:105) , (cid:104) GntE, (cid:105) , (cid:104) Ack, (cid:105)} . Definition 2:
Given a flow F , an execution ρ is analternating sequence of flow states and messages ending witha stop state. For flow F , ρ = s α s α s α . . . α n s n such that s i α i +1 −→ s i +1 , ∀ ≤ i < n, s i ∈ F .S, α i +1 ∈F . E , s n ∈ F . S p . The trace of an execution ρ is defined as trace ( ρ ) = α α α α . . . α n .For the cache coherence flow of Figure 1a, ρ = { n, ReqE,w, GntE, c, Ack, d } , trace ( ρ ) = { ReqE, GntE, Ack } .Intuitively, a flow provides a pattern of system execution. Aflow can be invoked several times, even concurrently, during asingle run of the system. To make precise the relation betweenan execution of the system with participating flows, we needto distinguish between these instances of the same flow. Thenotion of indexing accomplishes that by augmenting a flowwith an “index”. Definition 3: An indexed message is a pair α = (cid:104) m, i (cid:105) where m is the message and i ∈ N , referred to as the index of α . An indexed state is a pair ˆ s = (cid:104) s, j (cid:105) where s is a flowstate and j ∈ N , referred as the index of ˆ s . An indexed flow (cid:104) f, k (cid:105) is a flow consisting of indexed message m and indexedstate ˆ s indexed by k ∈ N .Figure 1b shows two instances of the cache coherence flowof Figure 1a indexed with their respective instance number.In our modeling, we ensure by construction that two differentinstances of the same flow do not have same indices. Note thatin practice, most SoC designs include architectural support toenable tagging , i.e. , uniquely identifying different concurrentlyexecuting instances of the same flow. Our formalization simplymakes the notion of tagging explicit. Definition 4:
Any two indexed flows (cid:104)F , i (cid:105) , (cid:104)G , j (cid:105) are saidto be legally indexed either if F (cid:54) = G or if F = G then i (cid:54) = j . Figure 1b shows two legally indexed instances of the cachecoherence flow of Figure 1a. Indices uniquely identify eachinstance of the cache coherence flow.A usage scenario is a pattern of frequently used applica-tions. Each such pattern comprises multiple interleaved flowscorresponding to communicating hardware IPs. Definition 5:
Let F , G be two legally indexed flows. Theinterleaving F (cid:57) G is a flow called interleaved flow definedas U = F (cid:57) G = (cid:104)F . S ×G . S , F . S ×G . S , F . S p ×G . S p , F . E ∪G . E , δ U , F .Atom ∪ G .Atom (cid:105) where δ U is defined as:i) s α −→ s (cid:48) ∧ s (cid:54)∈G .Atom (cid:104) s ,s (cid:105) α −→(cid:104) s (cid:48) ,s (cid:105) and ii) s β −→ s (cid:48) ∧ s (cid:54)∈F .Atom (cid:104) s ,s (cid:105) β −→(cid:104) s ,s (cid:48) (cid:105) where s , s (cid:48) ∈ F . S , s , s (cid:48) ∈ G . S , α ∈ F . E , β ∈ G . E . Everypath in the interleaved flow is an execution of U and representsan interleaving of the messages of the participating flows.Rule i) of δ U says that if s evolves to the state s (cid:48) whenmessage α is performed and if g has a state s which is notatomic/indivisible, then in the interleaved flow, if we have astate ( s , s ) , it evolves to state ( s (cid:48) , s ) when message α isperformed. A similar explanation holds good for Rule ii) of δ U .For any two concurrently executing legally indexed flow F and G , J = F (cid:57) G , for any s ∈ F .Atom and for any s (cid:48) ∈ G .Atom , ( s, s (cid:48) ) (cid:54)∈ J. S . If one flow is in one of its atomic/indivisiblestate, then no other concurrently executing flow can be in itsatomic/indivisible state.Figure 2 shows partial interleaving U of two legally indexedflow instances of Figure 1b. Since c and c both are atomicstate, state ( c , c ) is an illegal state in the interleaved flow. δ U and the Atom set make sure that such illegal states do notappear in the interleaved flows.Trace buffer availability is measured in terms of bits thusrendering bit width of a message important. In Definition 6,we define a message combination. Different instances of thesame message i.e. indexed messages are not required whilecomputing the bit width of the message combination.
Definition 6: A message combination M is an unorderedset of messages. The total bit width W of a message com-bination M is the sum total of the bit width of the individualmessages contained in M i.e. W ( M ) = (cid:80) ki =1 width ( m i ) = (cid:80) ki =1 w i , m i ∈ M , k = |M| .We introduce a metric called flow specification coverage to evaluate the quality of a message combination . Definition 7:
Let F be a flow. The visible flow states visible ( α ) of a message α ∈ F . E is defined as the setof flow states reached on the occurrence of message α i.e. , visible ( α ) = { s (cid:48) | s α −→ s (cid:48) , s, s (cid:48) ∈ F . S} . The flow specifi-cation coverage F Cov ( M ) of a message combination M is defined as the set union of the visible flow states of allthe messages in the message combination, expressed as afraction of the total number of flow states i.e. , F Cov ( M ) = ∪ ki =1 visible ( α i ) |F . S| , k = |M| .We extend the definition of a trace ( ρ ) of an execution ρ (c.f., Definition 2) to define message sequence and messageaggregate for diagnosis. Definition 8: A message sequence m ( ρ ) of a trace( ρ ) isdefined as a subsequence of the trace of the execution. The length k of a message sequence m ( ρ ) is defined as the number OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 4 of messages contained in m ( ρ ) . For example, for trace ( ρ ) = α α α . . . α n , m ( ρ ) = (cid:104) α α α (cid:105) is a message sequenceof trace( ρ ) of length k = 3 . Any two message sequences m i ( ρ ) and m j ( ρ ) of length k are distinct if ∃ l ∈ [1 , k ] , α i,l (cid:54) = α j,l where α i,l ∈ m i ( ρ ) , α j,l ∈ m j ( ρ ) . Definition 9: A message aggregate maggr ( ρ ) of a trace( ρ )is defined as an unordered set of message sequences of length k . Each distinct message sequence in a message aggregate iscalled an unique message sequence of that message aggre-gate. For example, maggr ( ρ ) = {(cid:104) α α α (cid:105) , (cid:104) α α α (cid:105)} is amessage aggregate of length 3 message sequences of trace( ρ ).Each of the (cid:104) α α α (cid:105) and (cid:104) α α α (cid:105) is an unique messagesequence of maggr ( ρ ) .For the cache coherence flow of Figure 1a, m ( ρ ) = (cid:104) ReqE, GntW (cid:105) , m ( ρ ) = (cid:104) GntW, Ack (cid:105) are two length-2 message sequences and maggr ( ρ ) = { m , m } = {(cid:104) ReqE, GntW (cid:105) , (cid:104) GntW, Ack (cid:105)} is a message aggregate.
A. Entropy and mutual information gain
Entropy : The entropy [27] measures the uncertainty ina random variable. Let X be a discrete random variablewith possible values X val = { x , x , . . . , x n } . Let p ( x ) be the associated probability mass function of X . The en-tropy of the random variable X is defined as H ( X ) = − (cid:80) x i ∈ X val p ( x i ) log p ( x i ) where p ( x i ) = | X = x i | / | X val | denotes the fraction of X in which X = x i . Mutual information gain : The
Mutual information gain [27]measures the amount of information that can be obtained aboutone random variable X by observing another random variable Y . More precisely, the conditional entropy of a random vari-able X with respect to another random variable Y is the reduc-tion in uncertainty in the realization of X when the outcome of Y is known. For jointly distributed discrete random variables X and Y , the mutual information gain of X relative to Y isgiven by I ( X ; Y ) = (cid:80) x,y p ( x, y ) log ( p ( x, y ) / ( p ( x ) p ( y ))) ,where p ( x ) and p ( y ) are the associated random probabilitymass function for two random variables X and Y respectively. B. Levenshtein distance
The Levenshtein distance is a string similarity metric formeasuring the dissimilarity between two strings. Mathemat-ically, the Levenshtein distance between two strings a, b (oflength | a | and | b | ) L a,b ( | a | , | b | ) is defined as: L a,b ( i, j ) = max ( i, j ) if min ( i, j ) = 0 min L a,b ( i − , j ) L a,b ( i, j − L a,b ( i − , j −
1) + ( ai (cid:54) = bj ) otherwise . where ( a i (cid:54) = b j ) is the indicator function equal to when a i == b j and equal to otherwise. The L a,b ( i, j ) is thedistance between the first i characters of a and the first j characters of b . We will denote L a,b ( | a | , | b | ) as L ( a, b ) .The salient features of Levenshtein distance are – i) it isat least the difference of the sizes of the two strings, ii) it isat most the length of the longer string, iii) it is zero iff thestrings are equal, and iv) if the strings are of the same size,the Hamming distance [11] is an upper bound. III. O UTLIER DETECTION
A. Outliers in ML
In ML, outliers are defined as data samples whose character-istics notably deviate from our expectation [7], [10]. Outliershave two characteristics – i) they are different and ii) they arerare as compared to normal data samples.In spite of straightforward definition, detecting an outlier ischallenging. First, the boundary between outliers and normalsamples are often not precise. Additionally, some outliersonly manifest their outlierness in an engineered feature spacederived from the original feature space via non-trivial transfor-mation. Second, the groundtruth of the outliers is often absentdue to prohibitive cost. Hence in many cases, outliers are deter-mined based on the sample characteristics alone. Unsupervisedoutlier detection algorithms are developed to identify outliersthrough only the patterns and intrinsic properties of the featurespace, and hence do not require any groundtruth labels.
B. Different notions of outliers
There are three distinct notions of outliers depending onthe profiling of normal samples –i) classification-based , ii) density-based , and iii) spectral-based . Classification-based outlier detection : Outliers can be de-fined by a classifier which can be learned in the feature spaceto distinguish between the normal and anomalous samples[7]. Any sample that does not fit the representation of thenormal samples would be considered as outliers. When thegroundtruth is unavailable, the classifier can be learned inan unsupervised manner.
One-class Support Vector Machine (OCSVM) [2], [26] is an unsupervised outlier detectionmethod that adopts this notion of outliers.
Density-based outlier detection : Density-based outliers arebased on the assumption that the normal data samples reside inneighborhoods of high density whereas outliers reside in low-density regions [7]. There are two distinct notion of density-based outlierness. First, the local density of a sample canbe estimated as its distance to its k -nearest neighbors, withlarger distances indicating higher degrees of outlierness. The k-Nearest Neighbors ( k NN) [3], [25] is an unsupervised outlierdetection technique that adopts this notion of outliers directlyand uses the distance to quantify outlierness. Second, therelative density of each data sample to the density of its k neighbors can be used as an indication of outlierness of asample. A normal sample has a local density that is similar toits neighbors whereas an outlier’s local density is lower thanthat of its neighbors. Local Outlier Factor (LOF) [6] is anunsupervised outlier detection method that identifies outliersbased on the relative density of a sample’s neighborhoods.
Spectral-based outlier detection : Spectral-based notion ofoutliers assumes that the difference between normal samplesand outliers can be significantly enhanced when the datais embedded into a lower dimensional subspace [7]. Hence,outlier detection methods that adopt this notion of outliersapproximate the data space using a transformation of theoriginal features to capture the variability in the data for easyoutlier identification.
Principal Component Analysis (PCA) The distance can be the distance to the k th distant neighbor or the averageof distances of all k neighbors. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 5
Step: 1
Find message combinations
Input : System level flows, Trace buffer width
Output : Message combination withInfo Gain maximized Trace buffer maximally utilized
Step: 2
Selecting a message combination based on mutual info gain
Step: 3
Packing the trace buffer
Fig. 3: Our message selection approach[15] is an unsupervised outlier detection method that canproject data into a lower dimensional space where most ofthe variability of the data is captured and explained by thenew dimensions. The variability that is not captured by thenew dimensions is considered as anomalous.
Isolation Forest (IForest) [16], [17] is another unsupervised outlier detectionmethod that attempts to identify outliers using only a subset ofthe features. IForest recursively select feature and split featurevalues in random until samples are isolated. Since outliersare rare and lie further away from the normal samples in thefeature space, the number of splittings required to isolate anoutlier can be served as its outlier score.
C. Metrics of an outlier detection algorithm
Definition 10:
The precision of an outlier detection algo-rithm is defined as the number of true positives expressed asa fraction of total number of samples labeled as belonging tothe outlier class i.e. , P recision = t p t p + f p where t p = numberof true positives, f p = number of false positives.AP Definition 11:
The recall of an outlier detection algorithmis defined as the number of true positives expressed as afraction of total number of true positives and false negatives i.e. , Recall = t p t p + f n , where f n = number of false negatives. Definition 12:
The accuracy of an outlier detection algo-rithm is defined as the number of samples that are correctlylabeled as belonging to both the outlier class and normalclass expressed as a fraction of total number of samples i.e. , Accuracy = t p + t n t p + t n + f p + f n , where t n = number of truenegatives.IV. M ESSAGE SELECTION METHODOLOGY
A. Objective of message selection methodology
Maximizing information gain is done in order to increaseflow specification coverage during post-silicon debug of usagescenarios. The message selection procedure considers the message combination M for tracing, whereas to calculateinformation gain over U , it uses indexed messages .Given a set of legally indexed participating flows of ausage scenario, bit widths of associated messages, and atrace buffer width constraint, our method selects a messagecombination such that information gain is maximized overthe interleaved flow U and the trace buffer is maximallyutilized .For the cache coherence flow example of Figure 1a, weassume a trace buffer width of 2 bits and concurrent executionof two instances of the flow. ReqE, GntE , and
Ack messageshappen between , and
IP pairs respectively.
ReqE, GntE , and
Ack consist of req, gnt and ack signaland each of the messages is 1-bit wide. Let B = { , } be theset of Boolean values. C ( ReqE ) = B | req | , C ( GntE ) = B | gnt | ,and C ( Ack ) = B | ack | denote respective message contents. B. Step 1 : Finding message combinations
In Step 1, we identify all possible message combinationsfrom the set of all messages of the participating flows in ausage scenario.While we find different message combinations, we alsocalculate the total bit width of each such combination. Anymessage combination that has a total bit width less than orequal to the available trace buffer width is kept for furtheranalysis in Step 2. Each such message combination is apotential candidate for tracing.In the example of Figure 1a, there are 3 messages and (cid:80) k =1 (cid:0) k (cid:1) = 7 different message combinations. Of these, onlyone ( ReqE, GntE, Ack ) has a bit width more than trace bufferwidth (2). We retain the remaining six message combinationsfor further analysis in Step 2. C. Step 2 : Selecting a message combination based on mutualinformation gain
In this step, we compute the mutual information gain ofmessage combinations computed in step 1 over the interleavedflow. We then select the message combination that has the highest mutual information gain for tracing.We use mutual information gain as a metric to evaluatethe quality of the selected set of messages with respect tothe interleaving of a set of flows. We associate two randomvariables with the interleaved flow namely X and Y i . X represents the different states in the interleaved flow i.e. itcan take any value in the set S of the different states of theinterleaved flow. Let M = (cid:83) i E i be the set of all possibleindexed messages in the interleaved flow. Let Y (cid:48) i be a candidatemessage combination and Y i be a random variable representingall indexed messages corresponding to Y (cid:48) i . All values of X are equally probable since the interleaved flow can be in anystate and hence p X ( x ) = |S| . To find the marginal distributionof Y i , we count the number of occurrences of each indexedmessage in the set M (cid:48) over the entire interleaved flow. Wedefine p Y i ( y ) = . To findthe joint probability, we use the conditional probability and themarginal distribution i.e. p ( x, y ) = p ( x | y ) p ( y ) = p ( y | x ) p ( x ) . P ( x | y ) can be calculated as the fraction of the interleavedflow states x is reached after the message Y i = y has beenobserved. In other words, p ( x | y ) is the fraction of times x is reached, from the total number of occurrences of theindexed message y in the interleaved flow i.e. p X | Y i ( x | y ) = . Now we substitute these valuesin I ( X ; Y ) to calculate the mutual information gain of thestate set X w.r.t Y i .In Figure 2, p X ( x ) = ∀ x ∈ S . Let Y (cid:48) = { GntE, ReqE } be a candidate message combination and Y = { } . For I ( X ; Y ) , we have p ( y = y i ) = , ∀ y i ∈ Y . Therefore, p X | Y ( x | GntE ) = { / if x =( c , n , / if x = ( c , w , / if x = ( c , d } and p X,Y ( x, GntE ) = { / if x = ( c , n , / if x =( c , w , / if x = ( c , d } . Similarly, we calculate p X,Y ( x, GntE ) , p X,Y ( x, ReqE ) and p X,Y ( x, ReqE ) . The mutual information gain is given by: I ( X, Y ) = (cid:80) x,y p ( x, y ) logp ( x, y ) /p ( x ) p ( y ) = 1 . . For multi-cycle messages, the number of bits that can be traced in a singlecycle is considered as the message bit width
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 6 L e g a l I P - p a i r i n d e x M e s s a g e i n d e x C y c l e r a n g e V a l u e s (a) Case study 1 ( k = 5 ) L e g a l I P - p a i r i n d e x M e s s a g e i n d e x C y c l e r a n g e V a l u e s (b) Case study 3 ( k = 5 ) L e g a l I P - p a i r i n d e x M e s s a g e i n d e x C y c l e r a n g e V a l u e s (c) Case study 5 ( k = 5 ) Fig. 4: (a), (b), and (c) show inability of raw feature data to demarcate anomalous message sequences.Similarly, we calculate the mutual information gain forthe remaining five message combinations. We then select themessage combination that has the highest mutual informationgain, which is I ( X, Y ) = 1 . thereby selecting the messagecombination Y (cid:48) = { ReqE, GntE } for tracing. Intuitively, in anexecution of U of Figure 2, if the observed trace is { } , immediately we are able to localize theexecution to two paths shown in red in Figure 2 among manypossible paths of U . D. Step 3 : Packing the trace buffer
Message combinations with the highest mutual informationgain selected in Step 2 may not completely fill the tracebuffer. To maximize trace buffer utilization, in this step we pack smaller message groups which are small enough to fit inthe leftover trace buffer width. Usually, these smaller messagegroups are part of a larger message that cannot be fit intothe trace buffer, e.g. in OpenSPARC T2, dmusiidata is20 bits wide message whereas cputhreadid a subgroupof dmusiidata is 6 bits wide. We select a message groupthat can fit into the leftover trace buffer width, such that theinformation gain of the selected message combination in unionwith this smaller message group is maximal. We repeat thisstep until no more smaller message groups can be added in theleftover trace buffer. Benefits of packing are shown empiricallyin Section VII-A.In our running example, the trace buffer is filled up by theset of selected message combination. The flow specificationcoverage achieved with Y (cid:48) is 0.7333.V. B UG SYMPTOM DIAGNOSIS METHODOLOGY
A. Formulation of post-silicon debug as an outlier detectionproblem
A post-silicon execution passes if it finishes without anyfailures e.g. , hangs, deadlock, livelock, crash etc., otherwisethe execution fails . For the diagnosis problem, we considertraced messages during execution as input data. In post-siliconexecution, a failure happens due to the occurrence of one ormore message sequence(s) that is symptomatic of one or moredesign bugs. We consider such a message sequence as an anomalous message sequence . Since an anomalous messagesequence represents a deviant design behavior, we considersuch a message sequence as an outlier in post-silicon executiondata space. Consequently, we formulate post-silicon diagnosisas an outlier detection problem.
Given a set of anomalous post-silicon executions, our diagnosis method identifies oneor more candidate anomalous message sequences.
Since post-silicon executions span millions of clock cycles,hence for tractable computation, we segregate raw trace datain multiple cycle ranges. Further, we assign an index to every legal
IP pair and to every unique message that happensin a post-silicon execution. The segregated trace data hasthree raw features – i) cycle range in which the message hasoccurred, ii) the index of the legal IP-pair between whichthe message has occurred, and iii) the index of a messagethat has occurred. In Figure 4 we show raw trace datain three-dimensional feature space for several case studies(c.f., Section VIII) for OpenSPARC T2 SoC.
B. Insufficiency of raw features for detection
An anomalous message sequence has two primary charac-teristics – i) it is infrequent and ii) it is deviant from othernormal message sequences. An in-depth inspection of Figure 4shows that the trace data in raw feature space has the followingdeficiencies – i) the raw features provide message-specificinformation, ii) in raw feature space outliers are not welldemarcated, and iii) the raw features fail to provide context ofthe failure during diagnosis.Hence, we pre-process raw trace message data to constructmessage sequences and characterize each such message se-quences for infrequency and deviancy using engineered fea-tures (c.f., Section III-A). The computational cost of analyzingeach of the individual message sequences can be prohibitivelylarge due to the large number of message sequences obtainedfrom traces. To keep computational cost nominal, insteadof analyzing each of the message sequences individually,we analyze message aggregates of message sequences andcharacterize each such aggregate for the anomaly.
C. Intuition of engineered features
In order to quantify the characterization of anomalousness,we calculate two engineered feature values of each of themessage aggregates – i) entropy (characterizes infrequency)and ii) Levenshtein distance (characterizes deviancy).
Entropy as an engineered feature : A message aggregate ischaracterized as anomalous if it contains one or more infre-quent unique message sequences. An aggregate is considered An IP pair is legal if a message is passed between them. This index is an enumeration of traced messages and is different fromindexed messages discussed in Definition 3.
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 7
TABLE I: Definition of anomalies using engineered featuresentropy and Levenshtein distance.
Ldist : Levenshtein distance. (cid:88) : Non-anomalous message aggregate. (cid:55) : Anomalous mes-sage aggregate.
Ldist Entropy
Low HighLow (cid:88) (cid:88)
High (cid:88) (cid:55) to be more anomalous if it contains many such infrequentunique message sequences. An information theoretic way toquantify the notion of infrequency is to compute the informa-tion content of the aggregate. Entropy is one such metric thatsuccinctly quantifies information content. An aggregate withfrequent unique message sequences will have less entropy dueto less information content. On the other hand, an aggregatewith more and more infrequent unique message sequences willhave higher entropy due to higher information content. Theentropy of a message aggregate is lower bounded by 0.0 (whenthe aggregate contains exactly one unique message sequence)and is upper bounded by log ( n ) (when the aggregate containsexactly one of each of the n unique message sequences). Levenshtein distance as an engineered feature : Entropy failsto characterize the specific relationship that exists betweenindividual unique message sequences of a message aggregate.Consequently, we calculate a similarity metric , in particular,Levenshtein distance (c.f., Section II-B) to quantify the de-viancy of the constituent message sequences in a messageaggregate. If a message aggregate contains similar unique mes-sage sequences, the dissimilarity score will be small whereasif the message aggregate contains deviant unique messagesequences, the dissimilarity score will be large. A messageaggregate with higher Levenshtein distance will likely to bemore anomalous as compared to another message aggregatewith smaller Levenshtein distance. Levenshtein distance ofa message aggregate is lower bounded by 0.0 (when theaggregate contains exactly one unique message sequence) andis upper bounded by the average of Hamming distance [11]of pairwise unique message sequences (when the aggregatecontains n different unique message sequences).Let us consider aggregates A1 : { ‘aba’, ‘bab’ } and A2 : { ‘aba’, ‘cdc’ } where a, b, c, d are messages. For each ofthe A1 and A2 , the entropy is log (2) = 1 . Although A2 comprises dissimilar unique message sequences as comparedto A1 , entropy alone fails to capture that dissimilarity. Hencewe calculate the Levenshtein distance of each of the aggregatesto quantify the dissimilarity of the constituent messages. For A1 , L (‘aba’, ‘bab’) = 2 (1 deletion and 1 insertion) and for A2 , L (‘aba’, ‘cdc’) = 3 (3 substitutions). Clearly, in spite ofhaving same entropy, Levenshtein distance helped to identify A2 to be more anomalous than A1 .In our diagnosis solution, we define a message aggregateas anomalous ( i.e. , contains anomalous unique message se-quences) that has both high entropy and high Levenshteindistance . Table I summarizes our definition of anomalousnessof a message aggregate. Usage of outlier detection algorithms : We apply outlierdetection algorithms to the engineered feature data space span-ning over entropy and Levenshtein distance. In the engineeredfeature space, message aggregates that represents normal be-havior will be very close to each other and will form a dense a b a b a c
Messages
Aggregrates a b b a a b b a a cX Y X Y Z
75 125 150 175
Cycles
Cycle range 1 Cycle range 2
Fig. 5: Example execution trace and a set of message se-quences of length k = 2 and granularity g = 100 cycles.cluster. On the other hand, message aggregates that representsanomalous behavior will be sparsely distributed and distantfrom the normal message aggregates. Outlier algorithms outputa ranked list of anomalous message aggregates ranked byoutlier scores. We output message sequences contained in top-five anomalous message aggregates as candidate anomalies. D. Example for generating engineered feature values from rawfeature values
We use an example trace of Figure 5 to explain the stepsfor generating engineered feature values. This methodology isparameterized by i) the length k of the message sequence forwhich anomaly needs to be detected and ii) the granularity g in number of cycles at which message aggregates need to becreated. For this example, we use k = 2 and g = 100 . Step 1 (Creation of message aggregates) : We use a slidingwindow of length k to create a set of k -length messagesequences. The set of message sequences are partitioned intomessage aggregates based on granularity g . In the example,the set of two-length message sequences is S= { ab , ba , ab , ba , ac } . We partition S at a granularity of 100 cycles which createstwo message aggregates s = { X, Y, X } and s = { X, Y, Z } where X = ab, Y = ba, Z = ac . Step 2 (Identifying unique message sequences and theiroccurrences per message aggregate) : We identify uniquemessage sequences per message aggregate and calculate theirnumber of occurrences. In this example, s has two uniquemessage sequences X and Y , and s has three unique messagesequences X , Y , and Z . In s , X happened two times, and Y happened one time. In s , each of the X , Y , and Z hashappened one time. Step 3 (Calculation of entropy and Levenshtein distanceper message aggregate) : We calculate entropy and Leven-shtein distance for each of the message aggregates using theinformation of unique message sequences from Step 2.In the example, for aggregate s , p ( X ) = 2 / and p ( Y ) =1 / . Hence H ( s ) = − p ( X ) log ( X ) − p ( Y ) log ( Y ) = − / ∗ log (2 / − / ∗ log (1 /
3) = 0 . and L ( X, Y ) =2 , L ( X, X ) = 0 , and L ( Y, X ) = 2 . The average Levenshteindistance of aggregate s is (2 + 0 + 2) / . .Similarly, for aggregate s , p ( X ) = 1 / , p ( Y ) = 1 / ,and p ( Z ) = 1 / . Hence H ( s ) = − p ( X ) log ( X ) − p ( Y ) log ( Y ) − p ( Z ) log ( Z ) = − / ∗ log (1 / − / ∗ log (1 / − / ∗ log (1 /
3) = 1 . and L ( X, Y ) =2 , L ( X, Z ) = 2 , and L ( Y, Z ) = 2 . The average Levenshteindistance of aggregate s is (2 + 2 + 2) / . .The aggregates s and s are represented by tuples (0.9182,1.33) and (1.58, 2.0) respectively in engineered feature space. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 8
CPU
Cache Processor Crossbar (CCX / PCX)
CPU CPU CPU CPU CPU CPU CPU
NCU
System Interface Unit (SIU)Network Interface Unit (NIU)
Data Management Unit (DMU / DSN)
High Speed IO Core
L2 Bank
L2 Bank
MCU
L2 Bank
L2 Bank
MCU
L2 Bank
L2 Bank
MCU
L2 Bank
L2 Bank
MCU
X8 PCI Express
10 Gb 10 Gb
C1 C2 C3 C4 C5 C7C6 C8 C9 C10
Operating at 350 MHzOperating at 1.5 GHz
Fig. 6: Block diagram of OpenSPARC T2 processor. NCU:Non-cacheable unit, MCU: Memory Controller Unit [29], [30]
Application Assembly Code Design CheckersSystemVerilog Monitors+ Verilog MonitorsDesign signalsDesign signals MessagesPass / Failure
Fig. 7: Experimental setup to convert design signals to flowmessagesWe input these tuples to outlier detection algorithms to detectanomalous message aggregates.
E. Limitations of the engineered features
Our proposed engineered features can diagnose a widerange of post-silicon use-case failures. However we do notclaim that our features are sufficient to diagnose any arbitrarypost-silicon use-case failures. For instance, if a buggy post-silicon execution trace consists of a very few unique messagesrepetitively ( e.g. , ‘abcabcabc. . . ’ where a, b, and c are uniquemessages), then the message aggregates will have frequentlyoccurring similar unique message sequences. This will result in low entropy (due to frequent occurrence of each of the uniquemessage sequences in the aggregate) and small average Leven-shtein distance per message aggregate (due to similar uniquemessage sequences in the aggregate) causing our method tofail to diagnose the bug. Certain class of bugs may escape ourdiagnosis method if the engineered features fail to demarcatecorrect and buggy behaviors in the engineered feature space.However this does not limit the practical applicability ofour diagnosis solution. Our solution expedites predominantlymanual post-silicon debugging by several orders of magnitude(c.f., Section VIII-F). To make our solution comprehensive,additional engineered features are needed.VI. E
XPERIMENTAL S ETUP
Design testbed : We primarily use the publicly availableOpenSPARC T2 SoC [29], [30] to demonstrate our result.Figure 6 shows an IP level block diagram of T2. Threedifferent usage scenarios considered in our debugging casestudies are shown in Table II along with participating flows(column 2-6) and participating IPs (column 7). We also use theUSB design [33] to compare with other methods that cannotscale to the T2.
Testbenches : We used 37 different tests from fc1_all_T2 regression environment. Each test exercises two or more IPsand associated flows. We monitored message communication TABLE II:
Usage scenarios and participating flows in T2.
UID :Usage scenario ID. PI : participating IPs. PRC : Number of potentialroot causes.
PIOR:
PIO read,
PIOW:
PIO write,
NCUU:
NCUupstream,
NCUD:
NCU downstream and
Mon:
Mondo interrupt flow. (cid:88) indicates Scenario i executes a flow j and (cid:55) indicates Scenario idoes not execute a flow j . Flows are annotated with (No of flowstates, No of messages). UID Participating flows PI PRC
PIOR (6, 5)
PIOW (3, 2)
NCUU (4, 3)
NCUD (3, 2)
Mon (6, 5)S1 (cid:88) (cid:88) (cid:55) (cid:55) (cid:88)
NCU, DMU,SIU 9S2 (cid:55) (cid:55) (cid:88) (cid:88) (cid:88)
NCU, MCU,CCX 8S3 (cid:88) (cid:88) (cid:88) (cid:88) (cid:55)
NCU, MCU,DMU, SIU 9
TABLE III: Representative bugs injected in IP blocks ofOpenSPARC T2.
Bug depth indicates the hierarchical depth ofan IP block from the top. Bug type is the functional implicationof a bug.
Bug Bug Bug Bug BuggyID depth category type IP across participating IPs and recorded the messages into anoutput trace file using the System-Verilog monitor of Figure 7.We also record the status (passing/failing) of each of the tests.
Bug injection : We created 5 different buggy versions of T2,that we analyze as five different case studies. Each case studycomprises 5 different IPs. We injected a total of 14 differentbugs across the 5 IPs in each case. The injected bugs followtwo sources –i) sanitized examples of communication bugsreceived from our industrial partners and ii) the “bug model”developed at the Stanford University in the QED [14] projectcapturing commonly occurring bugs in an SoC design. Afew representative injected bugs are detailed in Table III.Table III shows that the set of injected bugs are complex,subtle and realistic. It took up to
457 observed messages and up to for each bug symptom tomanifest. These demonstrate complexity and subtlety of theinjected bugs. Following [29], [30] and Table III, we haveidentified several potential architectural causes that can causean execution of a usage scenario to fail. Column 8 of Table IIshows number of potential root causes per usage scenario.
Anomaly detection techniques : We used six different outlierdetection algorithms, namely IForest, PCA, LOF, LkNN (kNNwith longest distance method), MukNN (kNN with meandistance method), and OCSVM from PyOD [35]. We appliedeach of the above outlier detection algorithms on the failuretrace data generated from each of the five different casestudies to diagnose anomalous message sequences that aresymptomatic of each of the injected bugs per case study.VII. E
XPERIMENTAL RESULTS ON MESSAGE SELECTION
In this section, we provide insights into the effectiveness ofour message selection technique using five different (buggy)case studies across 3 usage scenarios of the T2.
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 9
TABLE IV: Trace buffer utilization flow specification coverageand path localization of traced messages for 3 different usagescenarios.
FSP Cov : Flow specification coverage (Defini-tion 7), WP: With packing, WoP: Without Packing. 32 bitswide trace buffer assumed.
Case Usage Trace Buffer FSP Cov Pathstudy Scenario Utilization Localization
WP WoP WP WoP WP WoP1 S1 96.88% 84.37% 99.86% 97.22% 0.13% 3.23%2 0.31% 6.11%3 S2 100% 71.87% 99.69% 93.75% 0.26% 5.13%4 0.10% 2.47%5 S3 100% 93.75% 83.33% 77.78% 0.11% 2.65% F l o w Sp e c i f i c a t i o n C o v e r a g e Scenario 1Scenario 2Scenario 3
Fig. 8: Correlation analysis between mutual information gain and flow specification coverage for different message combi-nations for three different usage scenarios.
A. Flow specification coverage and trace buffer utilization
Table IV demonstrates the value of the traced messageswith respect to flow specification coverage (Definition 7)and trace buffer utilization. These are the two objectives forwhich our message selection is optimized. Messages selected without packing achieve up to 93.75% of trace bufferutilization with up to 97.22% flow specification coverage . With packing , message selection achieves up to 100% oftrace buffer utilization and up to 99.86% flow specificationcoverage . This shows that we can cover most of the desiredfunctionality while utilizing the trace buffer maximally.
B. Path localization during debug of traced messages
In this experiment, we use buggy executions and tracedmessages to show the extent of path localization per bug.Localization is calculated as the fraction of total paths ofthe interleaved flow. In Table IV, columns 7 and 8 show theextent of path localization. We needed to explore no morethan 6.11% of interleaved flow paths using our selectedmessages. With packing, we needed to explore no more than0.31% of the total interleaved flow paths during debugging.TABLE V: Comparison of signals selected by our method withSigSeT [4] and PRNet [18] for the USB design. P : Partial bit Signal USB Sig PR InfoName Module SeT Net Gain rx data UTMI (cid:55) (cid:88) (cid:88) rx valid line speed (cid:55) (cid:88) (cid:88) rx data valid Packet (cid:55) (cid:55) (cid:88) token valid decoder (cid:55) (cid:55) (cid:88) rx data done (cid:55) (cid:55) (cid:88) tx data Packet (cid:55) (cid:55) (cid:88) tx valid assembler (cid:55) (cid:88) (cid:88) send token Protocol (cid:55) (cid:55) (cid:88) token pid sel engine
P P (cid:88) data pid sel P (cid:55) (cid:88) TABLE VI: Selection of important messages by our method
Message Affecting Bug Message SelectedBug IDs coverage importance Y / N Usagescenariom1
8, 33, 36 0.21 4.76 Y 1, 2 m2
8, 33, 34, 36 0.28 3.57 Y 1, 2 m3
33, 36 0.14 7.14 Y 1, 2 m4
8, 29, 33 0.21 4.76 Y 1, 3 m5
18, 33 0.14 7.14 Y 1, 2 m6 - - N - m7 - - Y 1, 3 m8
33 0.07 14.28 Y 2 m9
1, 33 0.14 7.14 N - m10
24 0.07 14.28 Y 2 m11
1, 24 0.14 7.14 Y 2 m12
24 0.07 14.28 Y 2 m13 m14
1, 17, 33 0.21 4.76 Y 2 m15
1, 17, 18, 33 0.28 3.57 N - m16
1, 17, 18, 33 0.28 3.57 Y 2, 3 (b) Number of selected messages investigated N u m b e r o f p r un e d p o t e n t i a l r oo t c a u s e s (a) Number of traced messages investigated N u m b e r o f p r un e d c a n d i d a t e l e g a l I P p a i r s Case Study 1Case Study 2 Case Study 5Case Study 3 Case Study 4
Fig. 9: Root causing buggy IPEven with packing, subtle bugs like NCU bug of buggy design3 and buggy design 2 needed more paths to explore.
C. Validity of information gain as message selection metric
We select messages per usage scenario. In Figure 8 we an-alyze the correlation between flow specification coverage andthe mutual information gain of the selected messages. Flowspecification coverage (Definition 7) increases monotonicallywith the mutual information gain over the interleavedflow of the corresponding usage scenario. This establishesthat increase in mutual information gain corresponds tohigher coverage of flow specification , indicating that mutualinformation gain is a good metric for message selection.
D. Comparison of our method to existing signal selectionmethods
To demonstrate that existing Register Transfer Level signalselection methods cannot select messages in system level (a)
Case Study 1 (b)
Case Study 2 (c)
Case Study 3 (d)
Case Study 4 (e)
Case Study 5
Fig. 10: Selected messages-cause pruning distribution fordiagnosis. Plausible Cause, Pruned Cause
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 10
TABLE VII: Diagnosed root causes and debugging statistics for our case studies on OpenSPARC T2.
CaseStudy Noof.Flows Legal IPPairs Legal IPpairs in-vestigated Messagesinvestigated Root caused architecture level function
Case study 1 Case study 2 Case study 3 Case study 5
Case studies N o o f s a m p l e s TwoThreeFour FiveSixSeven EightNineTen ElevenTwelve ThirteenFourteen (a) Total number of samples
Case study 1 Case study 2 Case study 3 Case study 5
Case studies A v e r a g e r un t i m e ( i n l o g s c a l e ) PreprocessingPCA LOFLkNN MukNNIForest OCSVM (b) Run time comparison
Case study 1 Case study 2 Case study 3 Case study 5
Case studies A v e r a g e p e a k m e m o r y u s a g e ( i n M B ) PreprocessingPCA LOFLkNN MukNNIForest OCSVM (c) Peak memory usage comparison
Fig. 11: (a) shows total number of message aggregate samples for different length message sequences for different debuggingcase studies. (b) and (c) demonstrate that our diagnosis methodology is computationally efficient in terms of runtime and peakmemory usage across six different outlier detection algorithms for each of the case studies.flows, we compare our approach with an SRR-based method[4] and a PageRank based method [18].
We could not applyexisting SRR based methods on the OpenSPARC T2, sincethese methods are unable to scale. We use a smaller USBdesign for comparison with our method.
In the USB [33]design we consider a usage scenario consisting of two flows.Table V shows that our (mutual information gain based)method selects all of token_pid_sel , data_pid_sel and other important interface signals for system level debug-ging. SigSeT, on the other hand selects signals which are notuseful for system level debugging. Our messages are composedof interface signals, and achieve a flow specification coverageof , whereas messages composed of interface signalsselected by SigSeT and PRNet have a low flow specificationcoverage of and respectively. E. Selection of important messages by our method
For evaluation purposes, we use bug coverage as a metric, todetermine which messages are important. A message is saidto be affected by a bug if its value in an execution of thebuggy design differs from its value in an execution of thebug free design. Intuitively, if multiple bugs are affecting amessage, it is highly likely that message is a part of multipledesign paths. The bug coverage of a message is defined as thetotal number of bugs that affects a message, expressed as afraction of the total number of injected bugs. From debuggingperspective, a message is important if it is affected by very fewbugs implying that the message symptomatizes subtle bugs.Table VI confirms that post-Silicon bugs are subtle and tendto affect no more than 4 messages each. Column 4, 5 and 6 ofTable VI show that our method was able to select importantmessages from the interleaved flow to debug subtle bugs.Table VI shows that message m15 is affected by four bugsand message m9 is affected by two bugs, but due to their size being wider than 32 bits trace buffer, our method does notselect them. F. Effectiveness of selected messages in debugging usagescenarios
Every message is sourced by an IP and reaches a destinationIP. Bugs are injected into specific IPs (Table III). Duringdebug, sequences of IPs are explored from the point a bugsymptom is observed, to find the buggy IP. An IP pair( < source IP, destination IP > ) is legal if a message is passedbetween them. We use the number of legal IP pairs investigatedduring debug as a metric for selected messages. Table VIIshows that we investigated an average of 54.67% of the totallegal IP pairs, implying that our selected messages help usfocus on a fraction of the legal IP pairs.To debug a buggy execution, we start with the tracedmessage in which a bug symptom is observed and backtrack toother traced messages. The choice of which traced message toinvestigate is pseudo-random and guided by the participatingflows.Figure 9(a) plots the number of such investigated tracedmessages and the corresponding candidate legal IP pairs thatare eliminated with each traced message. Figure 9(b) showsa similar relationship between the traced messages and thecandidate root causes, i.e. , the architecture level functionsthat might have caused the bug to manifest in the tracedmessages. Both graphs show that with more traced messages,more candidate legal IP pairs as well as candidate root causesare progressively eliminated. This implies that every one ofour traced messages contributes to the debug process.Figure 10 shows that traced messages were able to prune outa large number of potential root causes in all five case studies.Our traced messages pruned out an average of 78.89% ( max.88.89% ) of candidate root causes. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 11
Entropy L e v e n s h t e i n d i s t a n c e (a) Case study 1 ( k = 5 ) Entropy L e v e n s h t e i n d i s t a n c e (b) Case study 3 ( k = 5 ) Entropy L e v e n s h t e i n d i s t a n c e (c) Case study 5 ( k = 5 ) Fig. 12: (a), (b), and (c) show that the engineered features demarcate normal and anomalous message aggregates.VIII. E
XPERIMENTAL RESULTS ON DEBUG ANDDIAGNOSIS
In this section we provide insights into our bug diagnosismethodology to debug five different buggy case studies acrossthree usage scenarios of the OpenSPARC T2 SoC. For theseexperiments, we have used g = 100000 cycles and varied k from two to the number of valid IP pairs (c.f., Table VII) foreach of the case studies. The number of message aggregatesamples for different lengths of message sequences for eachof the outlier detection algorithm per debugging case study isshown in Figure 11a. A. Computational efforts for data preprocessing and outliermessage sequence diagnosis
In this experiment, we show scalability of the automateddiagnosis methodology in terms of runtime and peak memoryusage. Figure 11b and Figure 11c show runtime and peakmemory usage for preprocessing and outlier detection algo-rithms. To calculate the average runtime and average peakmemory usage of each of the outlier detection algorithms, weran each of them times and calculated the average value.Preprocessing trace message data to create message se-quence aggregates incurred a runtime of up to 44.3 seconds ( average 10.8 seconds ) and peak memory usage of up to 508.7MB ( average 457.73 MB ). To run each of the outlier detectionalgorithms on the processed message aggregates incurredonly up to 18.91 seconds ( average 2.77 seconds ) and peakmemory usage of up to 508.2 MB ( average 451.27 MB ). Sincepreprocessing has up to 443 × ( average 3 × ) more runtimethan the running each of the outlier detection algorithms, weshowed runtime in the log scale in the Figure 11b. This experiment shows that our trace-data preprocessingand diagnosis is computationally efficient.
B. Validity of entropy and Levenshtein distance as engineeredfeature for outlier message sequence diagnosis
In this experiment, we analyze the effectiveness of entropy and
Levenshtein distance to identify message aggregates thatcontain anomalous message sequences. In Figure 12 we showjoint probability distribution of entropy and Levenshtein dis-tance and in Figure 13 we show minimum, maximum, andaverage of entropy and Levenshtein distance of anomalous message aggregates across different length message sequencesfor three different debugging case studies.As shown in Figure 12, in the engineered feature space,message aggregates for normal behavior form a dense clusterwhereas anomalous message sequences are sparsely distributedand are placed at a distance from the normal message aggre-gates. Further, Figure 13 shows that message aggregates thatcontain anomalous message sequences have entropy of up to4.3482 ( average 2.08 ) and Levenshtein distance of up to 3.0 ( average 1.5734 ). This experiment validates that entropy and Levenshteindistance are valuable and effective engineered featuresin demarcating the anomalous message aggregates fromnormal message aggregates.
C. Agreements among different outlier detection algorithms indetecting outlier message sequences
In this experiment, we assess the extent of agreementbetween anomalies identified by various outlier algorithms(c.f., Section III). Since this set of algorithms uses differentmethods for outlier detection, we surmise that the confidencein an anomalous message aggregate is higher, if multiple out-lier detection algorithms identify it as such. For this analysis,we consider the top 10% of anomalous message aggregatesper outlier detection algorithm per case study.Our analysis showed that six outlier detection algorithmsagree for a total of six anomalous message aggregates thatdiagnose of injected bugs, five outlier detection algo-rithms agree for a total of 17 anomalous message aggregatesthat diagnose of injected bugs, three outlier detectionalgorithms agree for a total of six anomalous message aggre-gates that diagnose of injected bugs, two outlier detectionalgorithms agree for a total of six anomalous message aggre-gates that diagnose of injected bugs.
This experiment shows that our engineered features aregeneric to characterize anomalies such that multiple outlierdetection algorithms agree on a large number of anomaliesthat diagnose multiple bugs. This observation motivated usto use a comprehensive anomaly score to rank messageaggregates.
We explain our comprehensive anomaly scorecalculation in Section VIII-F.
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 12 M i n / M a x / A v g v a l u e minmax avgmin maxavg (a) Case study 1 M i n / M a x / A v g v a l u e minmax avgmin maxavg (b) Case study 2 M i n / M a x / A v g v a l u e minmax avgmin maxavg (c) Case study 3 Fig. 13: (a), (b), and (c) show that the minimum, maximum, and average value of engineered features are high for anomalousmessage aggregates irrespective of message sequence lengths. (cid:104)H min , H max , H avg (cid:105) : Minimum, maximum, average entropy. (cid:104)L min , L max , L avg (cid:105) : Minimum, maximum, average Levenshtein distance.TABLE VIII: Diagnosis statistics for different outlier detection algorithms for different case studies using OpenSPARC T2SoC [29], [30]. PCA : Principal Component Analysis [15].
LOF : Local Outlier Factor based algorithm [6].
LkNN : k -NearestNeighbor using largest distance as metric [3], [25]. MukNN : k-Nearest Neighbor using mean distance as a metric [3], [25].
OCSVM : One-class Support Vector Machine [26]. D : Fraction of injected bugs diagnosed by an outlier detection algorithm. t p : Total number of true positive message sequences. f p : Total number of false positive message sequences (no more than 37%anomalous message sequences). P : Precision of an outlier detection algorithm. OS : Overall diagnosis statistics for each of theoutlier detection algorithm per debugging case study. Case IForest PCA LOF LkNN MukNN OCSVMstudy D t p f p P D t p f p P D t p f p P D t p f p P D t p f p P D t p f p P AS D. Comparison of precision of different outlier detection al-gorithms in detecting outlier message sequences
In this experiment, we compare the precision (c.f., Defini-tion 10), recall (c.f., Definition 11), and accuracy (c.f., Defini-tion 12) of each of the outlier detection algorithms in diagnos-ing anomalous messages sequences per debugging case study.In Table VIII, we show the fraction of injected bugs diagnosed,and the number of true positive and false positive candidateanomalous message sequences identified for each of the outlierdetection algorithm per debugging case study. In Table IX, weshow the fraction of total number of injected bugs diagnosed,total number of true positive, false positive, true negative,and false negative candidate anomalous message sequencesidentified across all of the outlier detection algorithms perdebugging case study. For this analysis, we considered onlythe top 10% anomalous message aggregates identified by eachof the outlier detection algorithm per debugging case study.Our analysis shows that IForest, MukNN, and OCSVMconsistently performed better in anomalous message sequencediagnosis as compared to the other three algorithms PCA,LOF, and LkNN. Each of the outlier detection algorithmdiagnosed up to 100% of injected bugs. IForest diagnosed onan average 73% of injected bugs with a precision of up to0.8 ( average 0.69 ), MukNN diagnosed on an average 67% ofinjected bugs with a precision of up to 0.77 ( average 0.70 ), andOCSVM diagnosed on an average 67% of injected bugs with aprecision of up to 0.8 ( average 0.74 ) per debugging case study.On the other hand, PCA diagnosed on an average average 40% of injected bugs with a precision of up to 0.8 ( average 0.69 ),LOF diagnosed on an average 47% of injected bugs with aprecision of up to 1.0 ( average 0.81 ), and LkNN diagnosedon an average 47% of injected bugs with a precision of up to0.82 ( average 0.76 ) per debugging case study. Further analysisshows (c.f., Table IX) our automated diagnosis technique wasable to detect up to 100% ( average 81.8% ) of injected bugswith a precision of up to 0.769 ( average 0.756 ) per debuggingcase study.In Table IX, we also show the recall and the accuracymetric per debugging case study. Our diagnosis methodologyachieved up to 0.69 ( average 0.46 ) recall and up to 0.56 ( average 0.39 ) accuracy. We note that in Table IX the value ofrecall and accuracy are relatively small. This is due to the factthat we are only considering the top 10% anomalous messageaggregates for this analysis. Consequently, the t p in the nu-merator is calculated from those top 10% anomalous messageaggregates whereas f n and t n are calculated based on theentire set of message aggregates. Consequently, the numeratorsare much smaller than the denominators (c.f., Definition 11and Definition 12) which results in a small value of recall andaccuracy. This experiment shows that our automated diagnosismethodology using engineered features is effective in iden-tifying complex and subtle bugs with high precision.
E. Improvement in diagnosis over manual debugging
In this experiment, we analyze the improvement in diagnosisin terms of number of injected bugs diagnosed and diagnosis
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 13
TABLE IX: Overall statistics of automated debugging acrossall outlier detection algorithms across all case studies. D :Fraction of injected bugs detected. P : Precision. R : Recall. A : Accuracy. Case D Sequences P R AStudy t p t n f p f n time over manual debugging. Table X (column 7 and column8) summarizes the diagnosis improvement. We were able todiagnose up to 66.7% more injected bugs ( average 46.67% )with up to 847 × ( average 464.35 × ) less diagnosis time. This experiment shows that our automated bug diagnosisis effective and expedites debugging.
F. Comprehensive ranking of outlier message sequences
In Section VIII-D, our experimental results showed thatIForest, OCSVM, and MukNN are the three most effectiveoutlier detection algorithms among six for diagnosing usefulanomalous message sequences that can help in debugging.Each of the IForest, OCSVM, and MukNN (c.f., Section III)detect anomalous message aggregates based on a differentperspective. IForest selects an anomalous message aggregatebased on shorter path lengths created by random selectionof a feature and recursive partitioning of the feature data .OCSVM selects an anomalous message aggregate by solvingan optimization problem to find a maximal margin hyperplane that best separates anomalous message aggregates. MukNN( i.e. , k -NN with mean distance as metric) selects an anomalousmessage aggregate based on a aggregate’s local density and thedistance to its k th nearest neighbor.Consequently, to incorporate these different perspectivesinto our diagnosis methodology, we use a heuristic combina-tion of outlier scores from each of the above three algorithmsfor each of the message aggregate. We found that a linearcombination of outlier scores of a message aggregate is incloser agreement with our empirical findings than relyingon outlier score of a message aggregate from each of theindividual algorithms. Let x be a message aggregate, Ano ( x ) be the comprehensive outlier score of x , and IF orest ( x ) , OCSV M ( x ) , and M ukN N ( x ) be the outlier score of x usingthe IForest, OCSVM, and MukNN algorithm respectively. Wedefine Ano ( x ) as Ano ( x ) = ( IF orest ( x ) + OCSV M ( x ) + M ukN N ( x )) / . In our experiments, we rank anomalousmessage aggregates based on the comprehensive outlier scoredefined above.IX. Q UALITATIVE CASE STUDY ON EFFECTIVENESS OFOUR MESSAGE SELECTION AND DIAGNOSIS METHODOLOGY
It is illuminating to understand a case study to appreciate theeffectiveness of the selected messages and our bug detectionmethodology in the debugging process.
Symptom:
In this experiment we used traced messages fromTable XI. The simulation failed with an error message
FAIL:Bad Trap . Manual debug with selected messages:
We consider bugsymptom causes of Table XI to debug this case. From the TABLE X: Summary of detection improvements achievedusing automated detection technique over manual debugging. N : Number of symptomatic message sequences identified. T :Time taken to identify a symptomatic message sequence. D :Improvement in terms of number of additional detected bugsas a fraction of injected bugs. t : Improvement in detectiontime. (cid:11) : Not available. Case Bug Manual Automated Improvementstudy ID N T N T D t(Hrs) (Secs) × (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) ×
18 1 3 2425 (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) (cid:11) × (cid:11) (cid:11)
44 5 1 6 14 57.5 66.7% 375.65 × (cid:11) (cid:11) (cid:11) (cid:11)
35 24 (cid:11) (cid:11) ×
39 1 6 6 observed trace messages, siincu and piowcrd , we identifyNCU got back correct credit ID at the end of the PIO readand PIO write operation respectively. This rules out two causesout of 9. However, we cannot rule out causes related to PIOpayload since a wrong payload may cause computing threadto catch
BAD Trap by requesting operand from wrong mem-ory location. Absence of trace messages mondoacknack and reqtot implies that NCU did not service any Mondointerrupt request and SIU did not request a Mondo payloadtransfer to NCU respectively. Further, there is no messagecorresponding to dmusiidata.cputhreadid in the tracefile, implying that DMU was never able to generate a Mondointerrupt request for NCU to process. This rules out all causesexcept cause ( ) to explore further to find the root cause. Manual root causing:
From [29], [30], we note that aninterrupt is generated only when DMU has credit and allprevious DMA reads are done. We found no prior DMA readmessages and DMU had all its credit available. Absence of dmusiidata message correct CPUID and ThreadID impliesthat DMU never generated a Mondo interrupt request. Thismakes DMU a plausible location of the root cause of the bug.
Debug with bug diagnosis methodology:
We apply our bugdiagnosis methodology on the same set of trace messages asbefore. The methodology identified five anomalous messageaggregates containing a total of 26 unique message sequences.We found 20 true positive anomalous message sequences thatare symptomatic of different bugs that we injected in thedesign. Among these 20 anomalous message sequences, 18message sequences were symptomatic of the bug that weidentified manually. The remaining two message sequenceswere symptomatic of the other two injected bugs.Clearly, while debugging manually, we were unable todetect the later two bugs because i) they were more subtleand ii) the symptomatic message sequences were extremelyinfrequent. Interestingly, the manual debug took approximatelyeight hours to diagnose one symptomatic message sequence.In comparison, the automated bug diagnosis methodology took
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 14
TABLE XI: Representative potential root causes for one case study. Rest of the root causes are omitted due to lack of space.Remaining case studies are available in [32]
Selected Messages Potential Causes Potential Implication reqtot,grant,mondoacknack,siincu,piowcrd Mondo request forwarded from DMU to SIU’s bypass queue instead of orderedqueue Mondo interrupt not serviced dmusiidata.cputhreadid Invalid Mondo payload forwarded to NCU from DMU via SIU Interrupt assigned to wrong CPU ID and Thread ID only approximately 62 seconds (an improvement of × )to pre-process the trace messages and to diagnose candidateanomalous message sequences using different outlier detectionalgorithms. Additionally, the diagnosis method was able todiagnose candidate anomalous message sequences for twomore bugs, an improvement of over manual debugging(c.f., Table X). This case study shows that our bug diagnosis method-ology automates and expedites tedious and error-pronemanual debugging process of post-silicon failures.
X. D
ISCUSSIONS AND C ONCLUSION
In light of our experimental findings, we believe that asynergistic application of feature engineering and anomalydetection is a powerful tool for application-level post-silicondebug and diagnosis. Although the two features presentedin this work capture a wide range of bugs, we concur thatthese sets of features are not complete and may fail tocapture certain application-level bugs. Since our proposedbug diagnosis framework is very generic, one may engineeradditional features to plug-in to diagnose a wider set of bugs.In conclusion, we have presented an automated post-siliconbug diagnosis methodology for SoC use-case failures. Oursolution uses the power of machine learning and feature engi-neering to automatically learn the buggy design behavior andthe normal design behavior from the trace data by analyzingintrinsic data feature without requiring prior knowledge of thedesign. Our proposed diagnosis solution is highly effective andcan diagnose many more bugs at a fraction of time with highprecision as compared to manual debugging. We demonstratethe effectiveness of our proposed diagnosis solution using real-world debugging case studies on the OpenSPARC T2 SoC.R
EFERENCES[1] Y. Abarbanel, E. Singerman, and M. Y. Vardi. Validation of socfirmware-hardware flows: Challenges and solution directions. In
The51st Annual DAC ’14, San Francisco, CA, USA, June 1-5, 2014 , pages2:1–2:4, 2014.[2] M. Amer, M. Goldstein, and S. Abdennadher. Enhancing one-classsupport vector machines for unsupervised anomaly detection. In
Pro-ceedings of the ACM SIGKDD Workshop on Outlier Detection andDescription , pages 8–15. ACM, 2013.[3] F. Angiulli and C. Pizzuti. Fast outlier detection in high dimensionalspaces. In
Proceedings of the 6th European Conference on Principlesof Data Mining and Knowledge Discovery , PKDD ’02, pages 15–26,London, UK, UK, 2002. Springer-Verlag.[4] K. Basu and P. Mishra. Efficient trace signal selection for post siliconvalidation and debug. In
VLSI Design (VLSI Design), 2011 24thInternational Conference on , pages 352–357. IEEE, 2011.[5] Y. Bengio, A. Courville, and P. Vincent. Representation learning: Areview and new perspectives. arXiv preprint arXiv:1206.5538 , 2014.[6] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: Identifyingdensity-based local outliers. In
Proceedings of the 2000 ACM SIGMODInternational Conference on Management of Data , SIGMOD ’00, pages93–104, New York, NY, USA, 2000. ACM.[7] V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey.
ACM computing surveys (CSUR) , 41(3):15, 2009. [8] D. Chatterjee, C. McCarter, and V. Bertacco. Simulation-based signalselection for state restoration in silicon debug. In
Computer-AidedDesign (ICCAD), 2011 IEEE/ACM International Conference on , pages595–601. IEEE, 2011.[9] R. Fraer, D. Keren, Z. Khasidashvili, A. Novakovsky, A. Puder,E. Singerman, E. Talmor, M. Y. Vardi, and J. Yang. From visual tological formalisms for soc validation. In
Twelfth ACM/IEEE MEM-OCODE 2014, Lausanne, Switzerland, October 19-21, 2014 , pages 165–174, 2014.[10] M. Goldstein and S. Uchida. A comparative evaluation of unsuper-vised anomaly detection algorithms for multivariate data.
PloS one ,11(4):e0152173, 2016.[11] R. W. Hamming. Error detecting and error correcting codes.
The BellSystem Technical Journal , 29(2):147–160, April 1950.[12] J. Heaton. An empirical analysis of feature engineering for predictivemodeling.
SoutheastCon 2016 , 2016.[13] Levenshtein distance. https://en.wikipedia.org/wiki/Levenshteindistance.[14] D. Lin, T. Hong, Y. Li, S. Kumar, F. Fallah, N. Hakim, D. Gardner,S. Mitra, et al. Effective post-silicon validation of system-on-chips usingquick error detection.
Computer-Aided Design of Integrated Circuits andSystems, IEEE Transactions on , 33(10):1573–1590, 2014.[15] M. ling Shyu, S. ching Chen, K. Sarinnapakorn, and L. Chang. A novelanomaly detection scheme based on principal component classifier. In in Proceedings of the IEEE Foundations and New Directions of DataMining Workshop, in conjunction with the Third IEEE InternationalConference on Data Mining (ICDM’03 , pages 172–179, 2003.[16] F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation forest. In
Proceedings ofthe 2008 Eighth IEEE International Conference on Data Mining , ICDM’08, pages 413–422, Washington, DC, USA, 2008. IEEE ComputerSociety.[17] F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation-based anomalydetection.
ACM Trans. Knowl. Discov. Data , 6(1):3:1–3:39, Mar. 2012.[18] S. Ma, D. Pal, R. Jiang, S. Ray, and S. Vasudevan. Can’t see the forestfor the trees: State restoration’s limitations in post-silicon trace signalselection. In
Proceedings of ICCAD 2015, Austin, TX, USA, November2-6, 2015 , pages 1–8, 2015.[19] T. M. Mitchell.
Machine learning . McGraw-Hill, Inc., New York, NY,USA, 1 edition, 1997.[20] S. Mitra, S. A. Seshia, and N. Nicolici. Post-silicon validation oppor-tunities, challenges and recent advances. In
Proceedings of the 47thDesign Automation Conference , DAC ’10, pages 12–17, New York, NY,USA, 2010. ACM.[21] K. P. Murphy.
Machine learning - A probabilistic perspective . Adaptivecomputation and machine learning series. MIT Press, 2012.[22] D. Pal, A. Sharma, S. Ray, F. M. de Paula, and S. Vasudevan. Applicationlevel hardware tracing for scaling post-silicon debug. In
Proceedingsof the 55th Annual Design Automation Conference, DAC 2018, SanFrancisco, CA, USA, June 24-29, 2018 , pages 92:1–92:6, 2018.[23] P. Patra. On the Cusp of a Validation Wall.
IEEE Design and. Test ofComputers , 24(2):193–196, 2007.[24] K. Rahmani, S. Ray, and P. Mishra. Postsilicon trace signal selectionusing machine learning techniques.
IEEE Trans. VLSI Syst. , 25(2):570–580, 2017.[25] S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for miningoutliers from large data sets. In
Proceedings of the 2000 ACM SIGMODInternational Conference on Management of Data , SIGMOD ’00, pages427–438, New York, NY, USA, 2000. ACM.[26] B. Sch¨olkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C.Williamson. Estimating the support of a high-dimensional distribution.
Neural Comput. , 13(7):1443–1471, July 2001.[27] C. E. Shannon. A mathematical theory of communication.
Bell SystemTechnical Journal , 27(3):379–423, 1948.[28] E. Singerman, Y. Abarbanel, and S. Baartmans. Transaction based pre-to-post silicon validation. In
Proceedings of the 48th DAC 2011, SanDiego, California, USA, June 5-10, 2011
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015, COMPILED ON 02/10/2021 AT 1:40am 15
FMCAD 2015, Austin, Texas, USA, September 27-30, 2015. , pages 168–175, 2015.[32] Zoom Out and See Better: Scalable Message Tracing for Post-SiliconSoC Debug, 2017. http://hdl.handle.net/2142/98857.[33] USB 2.0, 2008. http://opencores.org/project,usb.[34] S. Yerramilli. Addressing Post-Silicon Validation Challenge: LeverageValidation and Test Synergy. In
Keynote, Intl. Test Conf. , 2006.[35] Y. Zhao, Z. Nasrullah, and Z. Li. Pyod: A python toolbox for scalableoutlier detection. arXiv preprint arXiv:1901.01588arXiv preprint arXiv:1901.01588