[PDF] A Post-Silicon Trace Analysis Approach for System-on-Chip Protocol Debug

Abstract

Reconstructing system-level behavior from silicon traces is a critical problem in post-silicon validation of System-on-Chip designs. Current industrial practice in this area is primarily manual, depending on collaborative insights of the architects, designers, and validators. This paper presents a trace analysis approach that exploits architectural models of the system-level protocols to reconstruct design behavior from partially observed silicon traces in the presence of ambiguous and noisy data. The output of the approach is a set of all potential interpretations of a system's internal executions abstracted to the system-level protocols. To support the trace analysis approach, a companion trace signal selection framework guided by system-level protocols is also presented, and its impacts on the complexity and accuracy of the analysis approach are discussed. That approach and the framework have been evaluated on a multi-core system-on-chip prototype that implements a set of common industrial system-level protocols.

Full PDF

AA Post-Silicon Trace Analysis Approach forSystem-on-Chip Protocol Debug

Yuting Cao, Hao Zheng

Computer Science & EngineeringU. South FloridaTampa, FL 33620 {cao2, haozheng}@usf.edu Sandip Ray, Jin Yang

Strategic CAD LabIntelHillsboro, OR {sandip.ray, jin.yang}@intel.com

ABSTRACT

Reconstructing system-level behavior from silicon traces is acritical problem in post-silicon validation of System-on-Chipdesigns. Current industrial practice in this area is primarilymanual, depending on collaborative insights of the archi-tects, designers, and validators. This paper presents a traceanalysis approach that exploits architectural models of thesystem-level protocols to reconstruct design behavior frompartially observed silicon traces in the presence of ambigu-ous and noisy data. The output of the approach is a setof all potential interpretations of a system’s internal execu-tions abstracted to the system-level protocols. To supportthe trace analysis approach, a companion trace signal selec-tion framework guided by system-level protocols is also pre-sented, and its impacts on the complexity and accuracy ofthe analysis approach are discussed. That approach and theframework have been evaluated on a multi-core system-on-chip prototype that implements a set of common industrialsystem-level protocols.

1. INTRODUCTION

Post-silicon validation makes use of pre-production siliconintegrated circuit (IC) to ensure that the fabricated systemworks as desired under actual operating conditions with realsoftware. It is a critical component of the design valida-tion life-cycle for modern microprocessors and system-on-chip (SoC) designs. Unfortunately, it is also highly com-plex, performed under aggressive schedules and accountingfor more than 50% of the overall design validation cost [12].An SoC design is often composed of a large number of pre-designed hardware or software blocks (often referred to as“intellectual properties” or “IPs”) that coordinate throughcomplex protocols to implement system-level behavior [6].An execution trace of a system typically involves activitiesfrom the CPU, audio controller, display controller, wirelessradio antenna, etc., reﬂecting the interleaved execution ofa potentially large number of communication protocols. AsSoCs integrate more IPs, the interactions among the IPsare increasingly more complex. Moreover, modern intercon-nects are highly concurrent allowing multiple transactionsto be processed simultaneously for scalability and perfor-mance. They are an important source of design errors. Onthe other hand, observability limitations allow only a smallnumber of participating signals to be actually traced dur-ing silicon execution. Furthermore, electrical perturbationscause silicon data to be noisy, lossy, and ambiguous. It isnon-trivial during post-silicon debug to identify all partici-pating protocols and pinpoint the interleavings that result in an observed trace.Previous work [15] proposed a method for correlating sil-icon traces with system-level protocol speciﬁcations. Theidea was to reconstruct protocol execution scenarios from apartially observed silicon trace, which provide abstract viewsof system internal executions to facilitate post-silicon SoCdebug. While that work showed promising results, it hasa number of deﬁciencies precluding its applicability in prac-tice. First, there was no way to qualify or rank the quality ofprotocol execution scenarios generated by the reconstructionprocedure. Under poor observability condition, it was pos-sible for the algorithm to generate hundreds or thousands ofpotential protocol execution scenarios consistent with a par-tially observed trace. Without a metric to rank the qualityof these reconstructions, the debugger is faced with the un-enviable task of wading through these potential scenarios toinfer what may actually have happened in a speciﬁc siliconexecution. Moreover, based on past experiences, interleav-ings of diﬀerent protocol executions are a major source offunctional bugs. Since the method developed in [15] doesnot capture orderings among diﬀerent protocol executions,the results obtained with that method oﬀer little help forbug localization and root causing.This paper addresses the above deﬁciencies by introduc-ing an optimized trace analysis approach. Central to thisoptimized approach is a new formulation of protocol execu-tion scenarios that comprehends ordering relations amongprotocol executions. Quantitative metrics are also devel-oped so that the quality of the results derived by and theeﬃciency of the analysis approach can be measured. Tracesignal selections can have great impacts on the complexityand accuracy of the trace analysis. Therefore, a companiontrace signal selection framework is proposed. This frame-work is communication-centric, and guided by system-levelprotocols.

Its objective is to facilitate the trace analysis toproduce high quality interpretations of observed silicon traceseﬃciently . Various trace signal selection strategies are eval-uated and analyzed based on their impacts on the trace anal-ysis approach applied to a non-trivial multi-core SoC modelthat implements a number of common industrial system-level protocols.

2. FLOW SPECIFICATION

An SoC model as shown in Figure 1 is used to illustrateand experiment the work described in this paper. It con-sists of two CPUs (CPU X), each with a private Data Cache(Cache X), a graphics engine (GFX), a power managementunit (PMU), a system memory, and three peripheral blocks: a r X i v : . [ c s . A R ] M a y PU 0 CPU 1Cache 0 Cache 1PMUGFX BusAudio USB UART Memory

Figure 1: The block diagram of the simple SoCmodel. an audio control unit (Audio), a UART controller (UART),and a USB controller (USB). All these blocks are connectedthrough an interconnect fabric (Bus).System operations are realized by executions performed invarious blocks that are coordinated by system-level proto-cols. These protocols are typically speciﬁed in architecturedocuments as message ﬂow diagrams, where the words “pro-tocol” and “ﬂow” are used interchangeably. In this paper, asin [15], system ﬂows are formalized using Labeled Petri-nets(LPNs). Figure 2 shows a memory write protocol initiatedfrom a CPU

CPU_X in LPN where X , } and X = X .An LPN is a tuple ( P, T, E, L, s ) where P is a ﬁnite setof places , T is a ﬁnite set of transitions , E is a ﬁnite setof events , and L : T ! E is a labeling function that mapseach transition t T to an event e E . For each transition t T , its preset, denoted as • t ✓ P , is the set of placesconnected to t , and its postset, denoted as t •✓ P , is the setof places that t is connected to. A state s ✓ P of a LPN is asubset of places marked with tokens. There are two specialstates associated with each LPN; s ✓ P which is the set ofinitially marked places, also referred to as the initial state ,and the end state s end which is the set of places not goingto any transitions.A transition t can be executed in a state s if • t ✓ s .Executing t causes the labeled event to be emitted, and leadsto a new state s = ( s • t ) [ t • . Therefore, executing anLPN leads to a sequence of events. Execution of a LPNcompletes if its s end is reached.For example, in Figure 2, t can be executed in s = { p } . Event ( CPU X : Cache X : wr req ) is emitted after t isexecuted, and the LPN state becomes { p } . The end stateis s end = { p } .A ﬂow speciﬁcation may also contain multiple branchesdescribing di↵erent ways a system can execute such ﬂow.For example, the ﬂow shown in Figure 2 has three branchescovering the cases where the cache (snoop) operation is hitor miss.

3. POST-SILICON TRACE ANALYSIS3.1 Previous Work

CPU_X in LPN where X ∈ { , } and X (cid:48) = − X .An LPN is a tuple ( P, T, E, L, s ) where P is a ﬁnite setof places , T is a ﬁnite set of transitions , E is a ﬁnite setof events , and L : T → E is a labeling function that mapseach transition t ∈ T to an event e ∈ E . For each transition t ∈ T , its preset, denoted as • t ⊆ P , is the set of placesconnected to t , and its postset, denoted as t • ⊆ P , is the setof places that t is connected to. A state s ⊆ P of a LPN is asubset of places marked with tokens. There are two specialstates associated with each LPN; s ⊆ P which is the set ofinitially marked places, also referred to as the initial state ,and the end state s end which is the set of places not goingto any transitions.A transition t can be executed in a state s if • t ⊆ s .Executing t causes the labeled event to be emitted, and leadsto a new state s (cid:48) = ( s − • t ) ∪ t • . Therefore, executing anLPN leads to a sequence of events. Execution of a LPNcompletes if its s end is reached.For example, in Figure 2, t can be executed in s = { p } . Event ( CPU X : Cache X : wr req ) is emitted after t isexecuted, and the LPN state becomes { p } . The end stateis s end = { p } .A ﬂow speciﬁcation may also contain multiple branchesdescribing diﬀerent ways a system can execute such ﬂow.For example, the ﬂow shown in Figure 2 has three branchescovering the cases where the cache (snoop) operation is hitor miss.

3. POST-SILICON TRACE ANALYSIS3.1 Previous Work

This section recaps the previous approach in [15]. Theobjective of the trace analysis is to reconstruct design in-ternal behavior wrt given system-level ﬂow speciﬁcations F from a partially observed silicon trace on a small num-ber of hardware signals. The o↵-chip analysis includes twobroad phases: (1) trace abstraction, which maps a silicon p t : ( CPU X : Cache X : wr req ) p t : ( Cache X : Cache X : snp wr req ) p t : ( Cache X : Cache X : snp wr resp ) p t : ( Cache X : Bus : wr req ) p t : ( Bus : Mem : rd req ) p t : ( Mem : Bus : rd resp ) p t : ( Bus : Cache X : wr resp ) p t : ( Cache X : CPU X : wr resp ) t : ( Cache X : CPU X : wr resp ) t : ( Cache X : CPU X : wr resp ) p Figure 2: LPN formalization of a CPU write proto-col. Each LPN transition is labeled with an event ( src , dest , cmd ) where cmd is a command sent from asource component src to a destination component dest . The places without outgoing edges are termi-nals , which indicate termination of protocols repre-sented by the LPNs. trace into a sequence of ﬂow events , higher-level architec-tural constructs including e.g. , messages, operations, etc,and (2) trace interpretation, which infers possible ﬂow ex-ecution scenarios that are compliant with the abstractedevent sequence.To illustrate the basic idea, consider the system ﬂow inFigure 2, which we call F . Suppose that the following ﬂowexecution trace is abstracted from an observed silicon traceby executing a design that implements F . t t t t t t t t t t t t . . . Here the ﬂow events are referred to by their transition namesin the LPN. The ﬁrst four events result in the following ﬂowexecution scenario { ( F , , { p } ) , ( F , , { p } ) } . (1)A ﬂow execution scenario is deﬁned as a set of ﬂow instancesand their respective current states after some events are pro-cessed [15]. It can be viewed as an abstraction of systemstates wrt system ﬂows. The above execution scenario in-dicates that the sequence of the ﬁrst four events is a resultfrom executing those two ﬂow instances of F from their ini-tial states to the shown states. For the ﬁrst event t , it maybe a result from executing F , or F , , but exactly whichone is unknown due to limited observability. Both possiblecases are considered, and two execution scenarios below are Figure 2: LPN formalization of a CPU write proto-col. Each LPN transition is labeled with an event ( src , dest , cmd ) where cmd is a command sent from asource component src to a destination component dest . The places without outgoing edges are termi-nals , which indicate termination of protocols repre-sented by the LPNs. trace into a sequence of ﬂow events , higher-level architec-tural constructs including e.g. , messages, operations, etc,and (2) trace interpretation, which infers possible ﬂow ex-ecution scenarios that are compliant with the abstractedevent sequence.To illustrate the basic idea, consider the system ﬂow inFigure 2, which we call F . Suppose that the following ﬂowexecution trace is abstracted from an observed silicon traceby executing a design that implements F . t t t t t t t t t t t t . . . Here the ﬂow events are referred to by their transition namesin the LPN. The ﬁrst four events result in the following ﬂowexecution scenario { ( F , , { p } ) , ( F , , { p } ) } . (1)A ﬂow execution scenario is deﬁned as a set of ﬂow instancesand their respective current states after some events are pro-cessed [15]. It can be viewed as an abstraction of systemstates wrt system ﬂows. The above execution scenario in-dicates that the sequence of the ﬁrst four events is a resultfrom executing those two ﬂow instances of F from their ini-tial states to the shown states. For the ﬁrst event t , it maybe a result from executing F , or F , , but exactly whichone is unknown due to limited observability. Both possiblecases are considered, and two execution scenarios below areerived as a result from interpreting t . { ( F , , { p } ) , ( F , , { p } ) }{ ( F , , { p } ) , ( F , , { p } ) } . (2)After handling the next event t , the above two executionscenarios are reduced to the one as shown below. { ( F , , { p } ) , ( F , , { p } ) } . After the remaining six events are handled, the followingexecution scenario is derived. { ( F , , { p } ) , ( F , , { p } ) } As another example, now suppose that the design with abug generates the ﬂow trace below. t t t t t t t t t t t t . . . . This sequence is almost the same as the previous one ex-cept that the last event is t : (Cache_X:CPU_X:rd_resp) instead of t : (Mem:Bus:rd_resp) in the previous trace. t is an event used in a diﬀerent ﬂow speciﬁcation describinga CPU memory read protocol. Analyzing the trace rightbefore t leads to the execution scenarios below. { ( F , , { p } ) , ( F , , { p } ) }{ ( F , , { p } ) , ( F , , { p } ) } . (3)However, t cannot be a result from executing either ﬂowinstances in both scenarios, which indicates a noncompli-ance of the design implementation with respect to the givenﬂow speciﬁcation. Such an event is referred to as being in-consistent . In this case, the algorithm halts, and returns t and the derived ﬂow execution scenarios as shown in (3) fordebugger to examine further. The trace analysis approach in [15] does not capture order-ings among ﬂow instances for execution scenarios. However,from a debugger’s point of view, communication protocolscan be related. For example, a ﬁrmware loading protocolalways happens before a ﬁrmware execution protocol. Ifa ﬁrmware execution protocol is found to happen before aﬁrmware loading protocol, that possibly indicates an errorin the system implementing such protocols. Such propertiescannot be checked by the previous approach.To address that problem, this paper presents a new deﬁ-nition of ﬂow execution scenarios as { ( F i,j , s i,j , start i,j , end i,j ) | F i ∈ F } where start i,j and end i,j are two indices representing rela-tive time when F i,j is initiated and completed. The orderingrelations can be derived by comparing their start and end indices. For example, for two ﬂow instances in an executionscenario, ( F u,v , s u,v , start u,v , end u,v ) and ( F x,y , s x,y , start x,y , end x,y ), F u,v is initiated before F x,y if start u,v < start x,y , or F x,y is initiated after F u,v is completed if end u,v < start x,y .The ordering relations can provide more accurate informa-tion for understanding system execution under limited ob-servability. Section 3.3 explains how start i,j and end i,j aredecided during the trace analysis.In order to support the new deﬁnition of ﬂow executionscenarios, the trace abstraction, which maps an observed sil-icon trace to a linear sequence of ﬂow events as in [15], isalso generalized. A SoC design can be viewed as a group of IP blocks networked by an on-chip interconnect fabric.These blocks communicate with each other through com-munication links, each of which implements a protocol, suchas ARM AXI, over a set of wires. The approach presentedin this paper is communication centric in that it works onsilicon traces on a selected number of wires of a selectednumber of communication links for observation. Supposethat there are n communication links, and some wires fromeach link are selected for observation. A silicon trace is as-sumed to be a sequence of α , α , . . . such that each α i is avector deﬁned as α i = (cid:104) α ,i , . . . α n,i (cid:105) where α k,i is a state on link k in step i .If all wires of a link are observable, then a state on thatlink can be uniquely mapped to a ﬂow event of the same link.Under limited observability, a state on a link is typicallymapped to a set of ﬂow events. Therefore, a silicon trace isabstracted to a sequence (cid:126)E , (cid:126)E , . . . where (cid:126)E i = (cid:104) E ,i , . . . , E n,i (cid:105) (4)is a vector of sets of ﬂow events abstracted from α i , andeach E k,i in (cid:126)E i is a set of ﬂow events abstracted from state α k,i in α i . No temporal orderings exist among all events in (cid:126)E i . On the other hand, for two events, e i ∈ (cid:126)E i and e j ∈ (cid:126)E j such that i < j , then e i happens before e j .Based on diﬀerent levels of information captured, this pa-per classiﬁes ﬂow execution scenarios as follows. • Type-1 execution scenarios capture the number of in-stances of each ﬂow speciﬁcation initiated from a sili-con trace, and their relative orderings of initiations. • Type-2 execution scenarios, on top of what is capturedby Type-1 scenarios, capture completion of each ﬂowinstance. This additional information can be used toidentify potential problems if there is any ﬂow instancethat is not completed. Furthermore, Type-2 executionscenarios capture the relative orderings among all ﬂowinstances as described above. • Type-3 execution scenarios, on top of what is capturedby Type-2 scenarios, capture information on executionpaths followed by individual ﬂow instances. This in-formation can provide a means to debuggers to havea detailed examination on how each ﬂow instance isexecuted.These diﬀerent execution scenarios can be used to providediﬀerent views of system execution, from coarse-grained tomore detailed ones, at diﬀerent stages of debug.

Algorithm 1 shows the top-level procedure for detectinginternal ﬂow executions based on a partially observed silicontrace, and checks the compliance wrt a given ﬂow speciﬁca-tion. It takes as inputs F , a set of system level ﬂow spec-iﬁcations, and a signal trace ρ , which is assumed to be asequence of states on a set of observable trace signals, andeach state is uniquely indexed starting from 0.This algorithm scans trace ρ starting from index h initial-ized to 0, extracts all possible ﬂow events from ρ at index h as described in section 3.2 (line 6), and maps each of thoseextracted ﬂow events to update already detected executionscenarios (line 11). The algorithm terminates if one of twoconditions holds. If an inconsistence is encountered, the set lgorithm 1: Check-Compliance( F , ρ ) /* F : a set of ﬂow speciﬁcation */ /* ρ : a partially observed silicon trace */ M ← {∅} h ← while h ≤ | ρ | do (cid:126)E ← abstract ( ρ, h ) foreach E i of (cid:126)E do inconsistent ← true foreach e ∈ E i do foreach scen ∈ M do Scens ← analysis ( F , scen, e, h ) if Scens (cid:54) = ∅ then inconsistent ← false M ← ( M − scen ) ∪ Scens if inconsistent = true then return ( M , h, i ) h ← h + 1 return ( M , − , − h and i are returned (line 16). Index h provides tem-poral information on when the inconsistency occurs, while i provides spatial information on which communication linkan inconsistent event is transmitted. If no inconsistency isfound, the set of all execution scenarios compliant with theobserved trace is returned (line 18) when index h is largerthan the length of the trace.Algorithm 2 takes the speciﬁcation F , an execution sce-nario scen , a ﬂow event e , and index h of the trace where e is extracted, and it produces a set of execution scenarios R consistent with e . This algorithm performs two tasks. Inthe ﬁrst task (lines 5-12), the algorithm checks every ﬂowinstance to decide if e can be accepted. If such an instanceis found (line 7), then it is updated with the new state asthe result of e (line 9). Furthermore, if e causes the ﬂow in-stance to complete, its index end i,j is set to h (line 10-11),indicating the completion of that instance due to event e atstep h of the trace. In task 2, all possibilities where e caninitiate a new ﬂow instance are considered (line 14-20). Ifa new instance can be initiated, its start i,j is set to h , indi-cating the initiation of that instance due to a signal eventat step h of the trace. Due to the limited observability, reconstructing systemlevel executions from an observed silicon trace is an impre-cise process. The large number of execution scenarios typi-cally derived during the analysis would take large amountsof runtime and memory to process and to store, thus makingit less eﬃcient. This is referred to as the complexity prob-lem of the trace analysis. After the analysis is done, a largenumber of derived execution scenarios make it diﬃcult tounderstand the analysis results, thus being less helpful fordebugging. Obviously, a single ﬂow execution scenario de-rived at the end of the trace analysis provides much moreprecise information for debug than ten candidate ﬂow exe-cution scenarios. This is referred to as the accuracy problemof the trace analysis.

Algorithm 2:

Analysis ( F , scen, e, h ) /* scen = { ( F i,j , s i,j , start i,j , end i,j ) } */ /* e is a ﬂow event abstracted from silicon trace atindex h */ R = ∅ /* Check if e can change state any existing ﬂowinstances of scen */ foreach ( F i,j , s i,j , start i,j , end i,j ) ∈ scen do s (cid:48) i,j ← accept ( F i,j , s i,j , e ) if s (cid:48) i,j (cid:54) = ∅ then Let scen (cid:48) be a copy of scen Replace s i,j of scen (cid:48) with s (cid:48) i,j if s (cid:48) i,j = F i .s end then Update end i,j of scen (cid:48) with h R ← R ∪ scen (cid:48) /* Check if e can extend scen by initiating new ﬂowinstances */ foreach F i ∈ F do create a new instance F i,h s (cid:48) i,h ← accept ( F i,h , F i .s , e ) if s (cid:48) i,h (cid:54) = ∅ then Let scen (cid:48) be a copy of scen scen (cid:48) ← scen (cid:48) ∪ ( F i,h , s (cid:48) i,h , h, − R ← R ∪ scen (cid:48) return R The contributing factors to the complexity and accuracyproblems are explained below.1.

A signal event mapped to a set of ﬂow events − Due tothe limited observability, a signal event of an observedsilicon trace is often interpreted as a number of diﬀer-ent ﬂow events, which typically leads to derivation ofa number of diﬀerent execution scenarios. This situa-tion is exacerbated by the fact that silicon traces areoften very long, which could lead to excessively largenumbers of possible execution scenarios derived duringor at the end of the analysis.2.

A ﬂow event mapped to diﬀerent temporal ﬂow in-stances − Temporal ﬂow instances refer to the ﬂow in-stances activated by the same component, e.g. read/writeﬂows activated by

CPU_0 . If several temporal instancesof some ﬂows are activated by a component, mappingﬂow events to those ﬂow instances can be ambigu-ous. For example, suppose that an execution scenarioincludes two instances of the ﬂow as shown in Fig-ure 2 activated by

CPU_0 , one in state { p } , and theother one in state { p } . An instance of ﬂow event( Cache 0 : CPU 0 : wr resp ) can be mapped to eitherﬂow instance leading to two new execution scenariosfrom the current one.3. A ﬂow event mapped to ﬂow instances activated by dif-ferent components − This situation can happen whenﬂow instances that share some common events are ac-tivated by diﬀerent components. For example, supposean execution scenario has two instances of the ﬂow asshown in Figure 2, one activated by

CPU_0 and theother one by

CPU_1 , and both are in state { p } . Aﬂow event ( Mem : Bus : rd resp ) can be mapped to ei-ther one of these two instances, leading to two newxecution scenarios derived from the current one.The above issues can be mitigated by good signal selec-tions to be discussed in the following section. In order toevaluate the impacts of diﬀerent trace signal selections onthe complexity and accuracy of the trace analysis, this pa-per introduces two quantitative metrics. The complexity ismeasured by the peak count of ﬂow execution scenarios en-countered during the analysis process, i.e. , the largest size of M encountered during the execution of Algorithm 1. Theaccuracy is measured by the ﬁnal count of ﬂow executionscenarios derived at the end of the analysis process, i.e. , thesize of M returned on either line 16 or 18 of Algorithm 1.

4. TRACE SIGNAL SELECTION

Trace signal selection is a critical step in post-silicon de-bug. It includes two diﬀerent eﬀorts: pre-silicon and post-silicon. During pre-silicon selection, a few thousand signalsamong a vast number of internal signals are tapped for obser-vation. All necessary signals must be selected at this stage,otherwise, expensive re-design along with silicon re-spin arerequired. During post-silicon debug, a small subset of thosetapped signals are routed to the chip interface for tracingduring system execution.Previous work such as [4] is typically applied to gate leveldesign models, and the quality of the results is evaluated bythe commonly used state restoration ratio. However, it isdiﬃcult to scale those methods to large and complex SoCdesigns. More importantly, signals selected at the gate levelare often irrelevant to system-level functionalities. Thereis an attempt to raise the abstraction level for trace signalselection to the register transfer level (RTL) guided by asser-tions [11], however that work does not consider system levelfunctionalities either. In [3], a system level protocol guidedapproach is proposed. It is similar to our work in that bothare based on system level protocols. However, the selec-tion techniques developed in [3] are simple and irrelevantto understanding silicon traces at the system level, and theevaluation was performed on an abstract transaction levelmodel.This section introduces a framework shown in Figure 3 fortrace signal selection guided by system-level protocols. Dueto the page limit, this paper only considers the pre-silicontrace signal selection. Since the pre-silicon selection needsto support all types of execution scenarios, it is suﬃcient toconsider only Type-3 scenarios as they supersede Type-1 or-2 scenarios.

During the system level selection, diﬀerent subsets of ﬂowevents for observation are selected from given ﬂow speciﬁca-tions. Then, those results are passed to the more reﬁned bitlevel selection.To support Type-3 scenarios, the start and end events ofall ﬂow speciﬁcations must be selected. If a ﬂow speciﬁca-tion has multiple branches, additional events may need tobe selected so that the branch followed by a ﬂow instanceduring system execution can be captured. Figure 4 showstwo examples of diﬀerent branching structures for ﬂows. • In Figure 4(a), each branch ends with an unique event.There is no need to select additional events as observ-ing diﬀerent end events can clearly identify the branchfollowed during system execution. execution scenarios derived from the current one.The above issues can be mitigated by good signal selec-tions to be discussed in the following section. In order toevaluate the impacts of di↵erent trace signal selections onthe complexity and accuracy of the trace analysis, this pa-per introduces two quantitative metrics. The complexity ismeasured by the peak count of ﬂow execution scenarios en-countered during the analysis process, i.e. , the largest size of M encountered during the execution of Algorithm 1. Theaccuracy is measured by the ﬁnal count of ﬂow executionscenarios derived at the end of the analysis process, i.e. , thesize of M returned on either line 16 or 18 of Algorithm 1.

4. TRACE SIGNAL SELECTION

Trace signal selection is a critical step in post-silicon de-bug. It includes two di↵erent e↵orts: pre-silicon and post-silicon. During pre-silicon selection, a few thousand signalsamong a vast number of internal signals are tapped for obser-vation. All necessary signals must be selected at this stage,otherwise, expensive re-design along with silicon re-spin arerequired. During post-silicon debug, a small subset of thosetapped signals are routed to the chip interface for tracingduring system execution.Previous work such as [4] is typically applied to gate leveldesign models, and the quality of the results is evaluated bythe commonly used state restoration ratio. However, it isdicult to scale those methods to large and complex SoCdesigns. More importantly, signals selected at the gate levelare often irrelevant to system-level functionalities. Thereis an attempt to raise the abstraction level for trace signalselection to the register transfer level (RTL) guided by asser-tions [11], however that work does not consider system levelfunctionalities either. In [3], a system level protocol guidedapproach is proposed. It is similar to our work in that bothare based on system level protocols. However, the selec-tion techniques developed in [3] are simple and irrelevantto understanding silicon traces at the system level, and theevaluation was performed on an abstract transaction levelmodel.This section introduces a framework shown in Figure 3 fortrace signal selection guided by system-level protocols. Dueto the page limit, this paper only considers the pre-silicontrace signal selection. Since the pre-silicon selection needsto support all types of execution scenarios, it is sucient toconsider only Type-3 scenarios as they supersede Type-1 or-2 scenarios.

During the system level selection, di↵erent subsets of ﬂowevents for observation are selected from given ﬂow speciﬁca-tions. Then, those results are passed to the more reﬁned bitlevel selection.To support Type-3 scenarios, the start and end events ofall ﬂow speciﬁcations must be selected. If a ﬂow speciﬁca-tion has multiple branches, additional events may need tobe selected so that the branch followed by a ﬂow instanceduring system execution can be captured. Figure 4 showstwo examples of di↵erent branching structures for ﬂows. • In Figure 4(a), each branch ends with an unique event.There is no need to select additional events as observ-ing di↵erent end events can clearly identify the branchfollowed during system execution.

Flow Speciﬁcation F System-Level Selection

Sets of Flow

Events

RTL Model

Bit-Level Selection

Trace Signals

Figure 3: A framework of trace signal selection. (a) (b)

4. TRACE SIGNAL SELECTION

Flow Speciﬁcation F System-Level Selection

Sets of Flow

Events

RTL Model

Bit-Level Selection

Trace Signals

Figure 3: A framework of trace signal selection. (a) (b)

4. TRACE SIGNAL SELECTION

Flow Speciﬁcation F System-Level Selection

Sets of Flow

Events

RTL Model

Bit-Level Selection

Trace Signals

Figure 3: A framework of trace signal selection. (a) (b)

Figure 4: Examples of ﬂow structures. • In Figure 4(b), branches split and then join, and theﬂow ends with a common event. In this case, an uniqueevent needs to be selected for each branch.The ﬂow shown in Figure 2 has three branches with astructure similar to Figure 4(b). Its start and end eventsare { t , t , t , t } . Note that t , t , and t actually refer tothe same event. There is no choice for the right branch as t must be selected. To identify the left two branches fromthe right one, either t or t needs to be selected. Similarly,one of events in { t , t , t , t } needs to be selected in orderto identify the left branch from the middle one. Therefore,all possible event selections for that ﬂow are { t , t , t , t }⇥{ t , t }⇥{ t , t , t , t } . Among possibly large number of event selections, thereare two types of events that have interesting characteris-tics. One type includes events that are unique to speciﬁcﬂows (ref. unique events), while the other type includesevents shared by multiple ﬂows (ref. shared events). Thissection considers their impacts on the complexity and ac-curacy of the trace analysis and the signal selection. Forthe complexity and accuracy of the trace analysis, only is-sues t and t are used only in that ﬂow. During the trace analysis,they are just mapped to the instances of that particularﬂow. Issue (a) (b) Figure 4: Examples of ﬂow structures. • In Figure 4(b), branches split and then join, and theﬂow ends with a common event. In this case, an uniqueevent needs to be selected for each branch.The ﬂow shown in Figure 2 has three branches with astructure similar to Figure 4(b). Its start and end eventsare { t , t , t , t } . Note that t , t , and t actually refer tothe same event. There is no choice for the right branch as t must be selected. To identify the left two branches fromthe right one, either t or t needs to be selected. Similarly,one of events in { t , t , t , t } needs to be selected in orderto identify the left branch from the middle one. Therefore,all possible event selections for that ﬂow are { t , t , t , t } × { t , t } × { t , t , t , t } . Among possibly large number of event selections, thereare two types of events that have interesting characteris-tics. One type includes events that are unique to speciﬁcﬂows (ref. unique events), while the other type includesevents shared by multiple ﬂows (ref. shared events). Thissection considers their impacts on the complexity and ac-curacy of the trace analysis and the signal selection. Forthe complexity and accuracy of the trace analysis, only is-sues t and t are used only in that ﬂow. During the trace analysis,they are just mapped to the instances of that particularﬂow. Issue t and t have a similar characteristic.On the other hand, events t and t are used in manydiﬀerent ﬂows of diﬀerent components. Those ﬂows can beread/write ﬂows of CPU_0 or CPU_1 . During the trace anal-ysis, if there are multiple instances of such ﬂows, it is im-possible to know which of those ﬂow instances cause thoseevents to be generated. Therefore, the analysis algorithmhas to map those events to those ﬂow instances in all pos-sible ways. That can cause huge negative impacts on thecomplexity and accuracy of the trace analysis.In terms of trace signal selection, those two types of eventscan lead to diﬀerent results. If unique events are selected.then the total number of events selected can be large, and asa result, a large number of trace signals need to be selected inorder to observe those events. On the other hand, the totalnumber of events can be smaller if shared events are selected.That leads to a smaller number of trace signals that need tobe selected. The negative impacts of selecting shared eventscan also be mitigated if certain implementation details areavailable. Next section gives more discussions on that point.

The bit level selection takes as inputs the set of event selec-tions produced in the previous step and an RTL model thatimplements the system ﬂow speciﬁcations, and performs twotasks for each event selection:1. Evaluate its quality wrt the three issues discussed inSection 3.4;2. Choose one selection, and generate a set of candidatetrace signals that implement the selected events.The ultimate goal of the bit level selection is to produce areduced set candidate trace signals optimized for the traceanalysis approach. Since the bit level selection depends onimplementation speciﬁcs, this section can only discuss somegeneral guidelines and tradeoﬀs. Note that ﬂow speciﬁca-tions are typically independent of memory address and datainformation. Therefore, the address and data bits includedin event implementations can be generally ignored.Signals that implement the

Cmd ﬁeld of ﬂow events are se-lected based on their respective distinguishing power . Givena set of ﬂow events E and a set of signals W that implement E , the distinguishing power of W i ⊆ W , is deﬁned by E canbe partitioned wrt W i . A ﬁner partition means higher dis-tinguishing power. For example, suppose two ﬂow events onlink ( cpu0 , DCache0 ) implemented by eight signals b . . . b with the following encodings.( cpu0 : DCache0 : wr req ) 0100 0000( cpu0 : DCache0 : rd req ) 1000 0000Under these encodings, signals b . . . b have zero distin-guishing power. b and b have the equal power, thereforeselecting either one would be ﬁne. Selecting signals with high distinguishing power helps to address issue t or t are selected, observingtags is not needed.2. Shared events t or t are selected along with tags.For option 2, tags can help to map events to the ﬂow in-stances with the same tags during the trace analysis, thusaddressing issue p and p . When a ﬂow instance reaches p , which branch to take next depends on whether the cacheoperation is hit or miss. Similarly, which branch to take at p depends on whether the cache snoop operation is hit ormiss. If these two status signals are available and includedfor observation, there is no need to select branch events. Ob-serving start/end events plus those status signals are suﬃ-cient to identify branches followed by a ﬂow instance duringsystem execution.

5. EXPERIMENTAL RESULTS

To the best of our knowledge, this work is the ﬁrst topresent a systematic approach to post-silicon trace analy-sis guided by system level protocols. We are not able toﬁnd any similar previous work where ours can be evaluatedand compared with. The closest work to ours is [15]. How-ever, our work is more general and developed with practicalconsiderations. Additionally, the work in [15] is discussedand evaluated based on an abstract transaction-level modelwhile our approach is evaluated on a RTL model.

The ideas and techniques presented in this paper are eval-uated on a multi-core SoC prototype, as shown in Figure 1,which implements a number of common industrial system-level protocols including cache coherence and power man-agement. This prototype is a cycle- and pin-accurate RTLmodel written in VHDL. Even though this model is simplecompared to real SoC designs, it is much more sophisticatedthan the gate-level benchmark suites typically considered astargets for post-silicon analysis [10, 4, 5].Since the proposed trace analysis approach is communica-tion centric, the focus of this model is the implementation ofsystem-level protocols. The CPUs are treated as a test envi-ronment where software programs are simulated in VHDL totrigger various protocols. Therefore, there is no instructionache as no instructions are involved when the CPUs aresimulated. The peripheral blocks, GFX, PMU, Audio, etc ,are also described as abstract models that generate eventsto initiate ﬂows or to respond incoming requests.More details of some system-level protocols implementedin our model can be found in [3]. They include downstreamread/write protocols for each CPU, upstream read/write forthe peripheral blocks, and system power management pro-tocols, which are abstracted from real industrial protocols.These system-level protocols are supported by inter-blockcommunication protocols based on the ARM AXI4-lite [1].A total of 16 ﬂows are implemented for this prototype.A ﬂow event is generated from a source and consumed bya destination by messages transmitted over that link. In ourmodel, each message is organized as follows. (cid:104)

Val ( ) , Cmd ( ) , Tag ( ) , Sid ( ) , Addr ( ) , Data ( ) (cid:105) The meanings of the message ﬁelds are given below. Thenumbers following the individual ﬁelds indicate their respec-tive widths. Note that not all ﬁelds are used on all links.That model has over four thousand single bit signals.

Val indicates validity of a message.

Cmd carries operations to be performed by the target block.

Tag is used by

Bus to identify the original sources of mes-sages from diﬀerent blocks that go to the same desti-nation, e.g. memory wr_req from

Bus in response to wr_req from both CPUs.

Sid is an unique number generated by a component to rep-resent sequencing information of ﬂows initiated by thesame component.

Addr carries the memory address at the target block where

Cmd is applied.

Data carries data to a target or from a source. Its width canvary depending on the links where a message is sent.On the links between

Cache to Bus , the width is equalto the size of the cache block, which is 64 bytes. Forall the other links, the width is 32 bits.

Test Environment

The prototype is simulated in a ran-dom test environment where CPUs, GFX, and other pe-ripheral blocks are programmed to randomly select a ﬂowto initiate in each clock cycle. The contents of

Cmd , Addr ,and

Data in each activated ﬂow are set randomly. Addi-tionally, CPUs can activate power management protocolsnon-deterministically. Each of these blocks activates a totalof 100 ﬂow instances during entire simulation.

Trace Signal Selection

In the experiments, diﬀerentselections of trace signals are produced as discussed in sec-tion 4, and their impacts on the complexity and accuracyof the trace analysis approach are evaluated. The list belowexplains the selections at the system level while informationon the bit level selection is given in Table 1.S1 All events of all ﬂow speciﬁcations, and all signals im-plementing each event are selected. This selection of-fers full observation, and provides a baseline for com-paring with other selections.S2 The start and end events of all protocols are selected.Furthermore, for each branch in each ﬂow, one uniqueevent is selected. S3 The start and end events of all protocols are selected.Furthermore, for each branch in each ﬂow, a highlyshared event is selected.S4 The start and end events of all protocols are selected.Instead of selecting events for branches in each ﬂow,signals whose states control the ﬂow branching are se-lected.At the bit level, the

Addr and

Data ﬁelds are not consid-ered. On the other hand, the

Val bit is always selected sothat valid messages can be identiﬁed from observed traces.For selections S2, S3, and S4, experiments are performed toevaluate all combinations of

Cmd , Tag and

Sid ﬁelds.

In Table 1, a means that all signals implementing a par-ticular ﬁeld for all selected events in selection S X are traced.Otherwise, all those signals are not traced. Third row ( Cmd or Sid has severe impacts on the trace analysis as explainedin issues

Tag has negative impacts, but not as severe.The trace analysis can still ﬁnish even though it takes moretime and memory. Next, compare the results obtained byselecting

Cmd and

Sid but no

Tag under S2 − S4. The resultswith S4 are much better than S2 or S3. This is due to thatno branch events are selected for S4, therefore, issues

Cmd , Tag and

Sid areapplied to all events as the result of the system-level selec-tion. A ﬁner selection can be used to reduce trace signals ifunique events and shared events are considered separately.For unique events, the sources where they are generated areknown from ﬂows, therefore

Tag s need not be traced. Sharedevents may be results of ﬂow instances initiated by diﬀerentcomponents, therefore tracing

Tag s are necessary. On theother hand, tracing or not tracing

Cmd s has little impact onthe trace analysis. These points are supported by the re-sults shown in columns under “U S (cid:48)(cid:48) . Under S2, comparethe results under “U S (cid:48)(cid:48) against those with all three ﬁeldsselected. We can see that the runtime performance and thecomplexity and accuracy of the trace analysis are similarwhile the trace signals are reduced with the ﬁner selection.Comparing the results under “U S (cid:48)(cid:48) against those obtainedwith only

Cmd and

Sid selected, the complexity is signiﬁ-cantly dropped. The same conclusion can be drawn for S3and S4.From the above discussion, it is necessary to trace signalsimplementing

Cmd and

Sid whenever possible, and trace asmany signals implementing

Tag as allowed to reduce com-plexity of the trace analysis even more. If

Tag or Sid is not able 1: Runtime Results of Trace analysis with diﬀerent trace signal selections. Runtime is in seconds andmemory usage is in MB. − indicates the results are not available due to the 10 minute time limit exceeded. Systemlevelselection S1 S2 S3 S4U S U S U S

CmdTagSid − − − − − − > M k > M K > M > M > GB > GB > > part of the design, we recommend to add DFx circuitry inorder to trace such information. In the above experiments,the ﬁnal execution scenarios under diﬀerent signal selection,if available, contain the correct number of ﬂow instancesinitiated, and the orderings among the ﬂow instances, asgenerated by the test environment, are correctly captured.

6. RELATED WORK

Our work is closely related to communication-centric andtransaction based debug. An early pioneering work is de-scribed by Goossens et al. [9, 14, 8], which advocates thefocus on observing activities on the interconnect networkamong IP blocks, and mapping these activities to transac-tions for better correlation between computations and com-munications. A similar transaction-based debug approach ispresented by Gharebhagi and Fujita [7]. It proposes an auto-mated extraction of state machines at transaction level fromhigh level design models. From an observed failure trace, ittries to derive a set of feasible transaction traces that lead tothe observed failure state. However, this approach requiresmanual inputs and may not be able to derive such traces.Singerman et al. [13] deploys a central repository of systemevents and simple transactions deﬁned by architects and IPdesigners. It spans across a wide spectrum of the post-siliconvalidation including DFx instrumentation, test generation,coverage, and debug. Also, Abarbanel et al. [2] propose amodel at a higher-level of abstraction, ﬂows , is proposed.Flows are used to specify more sophisticated cross-IP trans-actions such as power management, security, etc, and tofacilitate reuse of the eﬀorts of the architectural analysis tocheck HW/SW implementations.

7. CONCLUSION

An improved trace analysis approach for post-silicon de-bug is presented where observed raw silicon traces are in-terpreted wrt system ﬂow speciﬁcations. In this approach,a new formulation of ﬂow execution scenarios is describedwhere more diverse information among ﬂows can be cap-tured and represented. A trace signal selection frameworkis also described in support of the proposed trace analysisapproach. Some observations on trace signal selections andtheir impacts on the accuracy and eﬃciency of the traceanalysis are discussed. Experiments on a non-trivial SoC prototype reveal insights on impacts of diﬀerent signal se-lections on the complexity and accuracy of the trace anal-ysis. In the future, we plan to perform more extensive andin-depth study on trace signal selections guided by systemﬂow speciﬁcations.

8. REFERENCES

Proceedings of DAC’14 ,pages 2:1–2:4, 2014.[3] M. Amrein. System-level trace signal selection forpost-silicon debug using linear programming. Master’sthesis, Univ. of Illinois Urbana-Champaign, May 2015.[4] K. Basu and P. Mishra. Eﬃcient trace signal selectionfor post silicon validation and debug. In

VLSI Design(VLSI Design) , pages 352–357. IEEE, 2011.[5] D. Chatterjee, C. McCarter, and V. Bertacco.Simulation-based signal selection for state restorationin silicon debug. In

ICCAD , pages 595–601.

IEEE, 2011.[6] H. D. Foster. Trends in functional veriﬁcation: A 2014industry study. In

DAC , pages 48:1–48:6, 2015.[7] A. M. Gharehbaghi and M. Fujita. Transaction-basedpost-silicon debug of many-core system-on-chips. In

ISQED , pages 702–708, 2012.[8] K. Goossens, B. Vermeulen, and A. B. Nejad. A high-leveldebug environment for communication-centric debug. In

Proceedings of DATE’09 , pages 202–207, 2009.[9] K. Goossens, B. Vermeulen, R. v. Steeden, andM. Bennebroek. Transaction-based communication-centricdebug. In

Proceedings of NOCS’07 , pages 95–106, 2007.[10] H. F. Ko and N. Nicolici. Algorithms for state restorationand trace-signal selection for data acquisition in silicondebug.

IEEE TCAD , 28(2):285–297, 2009.[11] S. Ma, D. Pal, R. Jiang, S. Ray, and S. Vasudevan. Can’tsee the forest for the trees: State restoration’s limitationsin post-silicon trace signal selection. ICCAD ’15, pages 1–8,Piscataway, NJ, USA, 2015. IEEE Press.[12] P. Patra. On the cusp of a validation wall.

IEEE Des. Test ,24(2):193–196, Mar. 2007.[13] E. Singerman, Y. Abarbanel, and S. Baartmans.Transaction based pre-to-post silicon validation. In

Proceedings of DAC’11 , pages 564–568, 2011.[14] B. Vermeulen and K. Goossens. A noc monitoringinfrastructure for communication-centric debug ofmbedded multi-processor socs. In

VLSI-DAT ’09 , pages183–186, 2009.[15] H. Zheng, Y. Cao, S. Ray, and J. Yang. Protocol-guidedanalysis of post-silicon traces under limited observability.In

Proceedings of ISQED’16 , pages 301–306, March 2016.

PPENDIXA. CPU READ/WRITE DOWNSTREAM PRO-TOCOL X= {

0, 1 } X’=1-XTarget= { Memory, USB, UART, AUDIO, GFX } CMD = { read, write } APPENDIXA. CPU READ/WRITE DOWNSTREAM PRO-TOCOL X= {

0, 1 } X’=1-XTarget= { Memory, USB, UART, AUDIO, GFX } CMD = { read, write } p t : ( CPU X : Cache X : CMD req ) p t : ( Cache X : Cache X : snp CMD req ) p t : ( Cache X : Cache X : snp CMD resp ) p t : ( Cache X : Bus : CMD req ) p t : ( Bus : Target : CMD req ) p t : ( Target : Bus : CMD resp ) p t : ( Bus : Cache X : CMD resp ) p t : ( Cache X : CPU X : CMD resp ) t : ( Cache X : CPU X : CMD resp ) t : ( Cache X : CPU X : CMD resp ) p Figure 5: LPN formalization of a CPU read/writeprotocol. Figure 6: MSQ of CPU downstream read/write pro-tocol

Figure 5: LPN formalization of a CPU read/writeprotocol. Figure 6: MSQ of CPU downstream read/write pro-tocol . UPSTREAM READ/WRITE PROTOCOL

Initiator= { GFX, USB, AUDIO, UART } Target= { Memory, USB, UART, AUDIO, GFX } Note that a peripheral can’t initialize a read ﬂow to readitself

B. UPSTREAM READ/WRITE PROTOCOL

Initiator= { GFX, USB, AUDIO, UART } Target= { Memory, USB, UART, AUDIO, GFX } Note that a peripheral can’t initialize a read ﬂow to readitself p t : ( Initiator : Bus : rd req ) p t : ( Bus : Cache 0 : snp req ) p t : ( Cache 0 : Cache 1 : snp req ) p t : ( Cache 1 : Cache 0 : snp resp ) p t : ( Cache 0 : Bus : snp resp ) p t : ( Bus : Target : rd / wt req ) p t : ( Target : Bus : rd / wt resp ) p t : ( Bus : I : rd resp ) t : ( Cache 0 : Bus : snp resp ) t : ( Bus : Initiator : rd resp ) p Figure 7: LPN formalization of a upstream read pro-tocol. Figure 8: MSQ of upstream read protocol

Figure 7: LPN formalization of a upstream read pro-tocol. Figure 8: MSQ of upstream read protocol nitiator= { GFX, AUDIO } Target= { Memory, USB, UART, AUDIO, GFX } Note that a peripheral can’t initialize a write ﬂow to writeitself

Initiator= { GFX, AUDIO } Target= { Memory, USB, UART, AUDIO, GFX } Note that a peripheral can’t initialize a write ﬂow to writeitself p t : ( Initiator : Bus : wr req ) p t : ( Bus : Cache 0 : snp wr req ) p t : ( Cache 0 : Cache 1 : snp wr req ) p t : ( Cache 1 : Cache 0 : snp wr resp ) p t : ( Cache 0 : Bus : snp wr resp ) p t : ( Bus : Target : wr req ) p t : ( Target : Bus : wr resp ) p t : ( Bus : Initiator : wr resp ) t : ( Bus : Initiator : wr resp ) t : ( Cache 0 : Bus : snp wr resp ) p Figure 9: LPN formalization of a upstream writeprotocol. Figure 10: MSQ of upstream write protocol

Figure 9: LPN formalization of a upstream writeprotocol. Figure 10: MSQ of upstream write protocol . CPU WRITE BACK PROTOCOLD. CPU POWER ON/OFF PROTOCOL X= { } Target= { Memory, USB, UART, AUDIO, GFX } C. CPU WRITE BACK PROTOCOL X= { } Target= { Memory, USB, UART, AUDIO, GFX } p t : ( Cache X : Bus : wb req ) p t : ( Bus : Target : wb req ) p t : ( Target : Bus : wb resp ) p Figure 11: LPN formalization of a CPU write backprotocol.Figure 12: MSQ of power write back protocol

D. CPU POWER ON/OFF PROTOCOL

CMD= { pwr on, pwr o↵ } X= { } Target= { Memory, USB, UART, AUDIO, GFX } p t : ( CPU X : Cache X : CMD req ) p t : ( Cache X : Bus : CMD req ) p t : ( Bus : PWR : CMD req ) p t : ( PWR : Target : CMD req ) p t : ( Target : PWR : CMD resq ) p t : ( PWR : Bus : CMD resq ) p t : ( Bus : Cache X : CMD resq ) p t : ( Cache X : CPU X : CMD resq ) p Figure 13: LPN formalization of a CPU poweron/o↵ protocol.Figure 14: MSQ of power on/o↵ protocol

Figure 11: LPN formalization of a CPU write backprotocol.Figure 12: MSQ of power write back protocol

MD= { pwr on, pwr oﬀ } X= { } Target= { Memory, USB, UART, AUDIO, GFX } C. CPU WRITE BACK PROTOCOL X= { } Target= { Memory, USB, UART, AUDIO, GFX } p t : ( Cache X : Bus : wb req ) p t : ( Bus : Target : wb req ) p t : ( Target : Bus : wb resp ) p Figure 11: LPN formalization of a CPU write backprotocol.Figure 12: MSQ of power write back protocol

D. CPU POWER ON/OFF PROTOCOL