A Post-Silicon Trace Analysis Approach for System-on-Chip Protocol Debug
AA Post-Silicon Trace Analysis Approach forSystem-on-Chip Protocol Debug
Yuting Cao, Hao Zheng
Computer Science & EngineeringU. South FloridaTampa, FL 33620 {cao2, haozheng}@usf.edu Sandip Ray, Jin Yang
Strategic CAD LabIntelHillsboro, OR {sandip.ray, jin.yang}@intel.com
ABSTRACT
Reconstructing system-level behavior from silicon traces is acritical problem in post-silicon validation of System-on-Chipdesigns. Current industrial practice in this area is primarilymanual, depending on collaborative insights of the archi-tects, designers, and validators. This paper presents a traceanalysis approach that exploits architectural models of thesystem-level protocols to reconstruct design behavior frompartially observed silicon traces in the presence of ambigu-ous and noisy data. The output of the approach is a setof all potential interpretations of a system’s internal execu-tions abstracted to the system-level protocols. To supportthe trace analysis approach, a companion trace signal selec-tion framework guided by system-level protocols is also pre-sented, and its impacts on the complexity and accuracy ofthe analysis approach are discussed. That approach and theframework have been evaluated on a multi-core system-on-chip prototype that implements a set of common industrialsystem-level protocols.
1. INTRODUCTION
Post-silicon validation makes use of pre-production siliconintegrated circuit (IC) to ensure that the fabricated systemworks as desired under actual operating conditions with realsoftware. It is a critical component of the design valida-tion life-cycle for modern microprocessors and system-on-chip (SoC) designs. Unfortunately, it is also highly com-plex, performed under aggressive schedules and accountingfor more than 50% of the overall design validation cost [12].An SoC design is often composed of a large number of pre-designed hardware or software blocks (often referred to as“intellectual properties” or “IPs”) that coordinate throughcomplex protocols to implement system-level behavior [6].An execution trace of a system typically involves activitiesfrom the CPU, audio controller, display controller, wirelessradio antenna, etc., reflecting the interleaved execution ofa potentially large number of communication protocols. AsSoCs integrate more IPs, the interactions among the IPsare increasingly more complex. Moreover, modern intercon-nects are highly concurrent allowing multiple transactionsto be processed simultaneously for scalability and perfor-mance. They are an important source of design errors. Onthe other hand, observability limitations allow only a smallnumber of participating signals to be actually traced dur-ing silicon execution. Furthermore, electrical perturbationscause silicon data to be noisy, lossy, and ambiguous. It isnon-trivial during post-silicon debug to identify all partici-pating protocols and pinpoint the interleavings that result in an observed trace.Previous work [15] proposed a method for correlating sil-icon traces with system-level protocol specifications. Theidea was to reconstruct protocol execution scenarios from apartially observed silicon trace, which provide abstract viewsof system internal executions to facilitate post-silicon SoCdebug. While that work showed promising results, it hasa number of deficiencies precluding its applicability in prac-tice. First, there was no way to qualify or rank the quality ofprotocol execution scenarios generated by the reconstructionprocedure. Under poor observability condition, it was pos-sible for the algorithm to generate hundreds or thousands ofpotential protocol execution scenarios consistent with a par-tially observed trace. Without a metric to rank the qualityof these reconstructions, the debugger is faced with the un-enviable task of wading through these potential scenarios toinfer what may actually have happened in a specific siliconexecution. Moreover, based on past experiences, interleav-ings of different protocol executions are a major source offunctional bugs. Since the method developed in [15] doesnot capture orderings among different protocol executions,the results obtained with that method offer little help forbug localization and root causing.This paper addresses the above deficiencies by introduc-ing an optimized trace analysis approach. Central to thisoptimized approach is a new formulation of protocol execu-tion scenarios that comprehends ordering relations amongprotocol executions. Quantitative metrics are also devel-oped so that the quality of the results derived by and theefficiency of the analysis approach can be measured. Tracesignal selections can have great impacts on the complexityand accuracy of the trace analysis. Therefore, a companiontrace signal selection framework is proposed. This frame-work is communication-centric, and guided by system-levelprotocols.
Its objective is to facilitate the trace analysis toproduce high quality interpretations of observed silicon tracesefficiently . Various trace signal selection strategies are eval-uated and analyzed based on their impacts on the trace anal-ysis approach applied to a non-trivial multi-core SoC modelthat implements a number of common industrial system-level protocols.
2. FLOW SPECIFICATION
An SoC model as shown in Figure 1 is used to illustrateand experiment the work described in this paper. It con-sists of two CPUs (CPU X), each with a private Data Cache(Cache X), a graphics engine (GFX), a power managementunit (PMU), a system memory, and three peripheral blocks: a r X i v : . [ c s . A R ] M a y PU 0 CPU 1Cache 0 Cache 1PMUGFX BusAudio USB UART Memory
Figure 1: The block diagram of the simple SoCmodel. an audio control unit (Audio), a UART controller (UART),and a USB controller (USB). All these blocks are connectedthrough an interconnect fabric (Bus).System operations are realized by executions performed invarious blocks that are coordinated by system-level proto-cols. These protocols are typically specified in architecturedocuments as message flow diagrams, where the words “pro-tocol” and “flow” are used interchangeably. In this paper, asin [15], system flows are formalized using Labeled Petri-nets(LPNs). Figure 2 shows a memory write protocol initiatedfrom a CPU
CPU_X in LPN where X , } and X = X .An LPN is a tuple ( P, T, E, L, s ) where P is a finite setof places , T is a finite set of transitions , E is a finite setof events , and L : T ! E is a labeling function that mapseach transition t T to an event e E . For each transition t T , its preset, denoted as • t ✓ P , is the set of placesconnected to t , and its postset, denoted as t •✓ P , is the setof places that t is connected to. A state s ✓ P of a LPN is asubset of places marked with tokens. There are two specialstates associated with each LPN; s ✓ P which is the set ofinitially marked places, also referred to as the initial state ,and the end state s end which is the set of places not goingto any transitions.A transition t can be executed in a state s if • t ✓ s .Executing t causes the labeled event to be emitted, and leadsto a new state s = ( s • t ) [ t • . Therefore, executing anLPN leads to a sequence of events. Execution of a LPNcompletes if its s end is reached.For example, in Figure 2, t can be executed in s = { p } . Event ( CPU X : Cache X : wr req ) is emitted after t isexecuted, and the LPN state becomes { p } . The end stateis s end = { p } .A flow specification may also contain multiple branchesdescribing di↵erent ways a system can execute such flow.For example, the flow shown in Figure 2 has three branchescovering the cases where the cache (snoop) operation is hitor miss.
3. POST-SILICON TRACE ANALYSIS3.1 Previous Work
This section recaps the previous approach in [15]. Theobjective of the trace analysis is to reconstruct design in-ternal behavior wrt given system-level flow specifications F from a partially observed silicon trace on a small num-ber of hardware signals. The o↵-chip analysis includes twobroad phases: (1) trace abstraction, which maps a silicon p t : ( CPU X : Cache X : wr req ) p t : ( Cache X : Cache X : snp wr req ) p t : ( Cache X : Cache X : snp wr resp ) p t : ( Cache X : Bus : wr req ) p t : ( Bus : Mem : rd req ) p t : ( Mem : Bus : rd resp ) p t : ( Bus : Cache X : wr resp ) p t : ( Cache X : CPU X : wr resp ) t : ( Cache X : CPU X : wr resp ) t : ( Cache X : CPU X : wr resp ) p Figure 2: LPN formalization of a CPU write proto-col. Each LPN transition is labeled with an event ( src , dest , cmd ) where cmd is a command sent from asource component src to a destination component dest . The places without outgoing edges are termi-nals , which indicate termination of protocols repre-sented by the LPNs. trace into a sequence of flow events , higher-level architec-tural constructs including e.g. , messages, operations, etc,and (2) trace interpretation, which infers possible flow ex-ecution scenarios that are compliant with the abstractedevent sequence.To illustrate the basic idea, consider the system flow inFigure 2, which we call F . Suppose that the following flowexecution trace is abstracted from an observed silicon traceby executing a design that implements F . t t t t t t t t t t t t . . . Here the flow events are referred to by their transition namesin the LPN. The first four events result in the following flowexecution scenario { ( F , , { p } ) , ( F , , { p } ) } . (1)A flow execution scenario is defined as a set of flow instancesand their respective current states after some events are pro-cessed [15]. It can be viewed as an abstraction of systemstates wrt system flows. The above execution scenario in-dicates that the sequence of the first four events is a resultfrom executing those two flow instances of F from their ini-tial states to the shown states. For the first event t , it maybe a result from executing F , or F , , but exactly whichone is unknown due to limited observability. Both possiblecases are considered, and two execution scenarios below are Figure 1: The block diagram of the simple SoCmodel. an audio control unit (Audio), a UART controller (UART),and a USB controller (USB). All these blocks are connectedthrough an interconnect fabric (Bus).System operations are realized by executions performed invarious blocks that are coordinated by system-level proto-cols. These protocols are typically specified in architecturedocuments as message flow diagrams, where the words “pro-tocol” and “flow” are used interchangeably. In this paper, asin [15], system flows are formalized using Labeled Petri-nets(LPNs). Figure 2 shows a memory write protocol initiatedfrom a CPU
CPU_X in LPN where X ∈ { , } and X (cid:48) = − X .An LPN is a tuple ( P, T, E, L, s ) where P is a finite setof places , T is a finite set of transitions , E is a finite setof events , and L : T → E is a labeling function that mapseach transition t ∈ T to an event e ∈ E . For each transition t ∈ T , its preset, denoted as • t ⊆ P , is the set of placesconnected to t , and its postset, denoted as t • ⊆ P , is the setof places that t is connected to. A state s ⊆ P of a LPN is asubset of places marked with tokens. There are two specialstates associated with each LPN; s ⊆ P which is the set ofinitially marked places, also referred to as the initial state ,and the end state s end which is the set of places not goingto any transitions.A transition t can be executed in a state s if • t ⊆ s .Executing t causes the labeled event to be emitted, and leadsto a new state s (cid:48) = ( s − • t ) ∪ t • . Therefore, executing anLPN leads to a sequence of events. Execution of a LPNcompletes if its s end is reached.For example, in Figure 2, t can be executed in s = { p } . Event ( CPU X : Cache X : wr req ) is emitted after t isexecuted, and the LPN state becomes { p } . The end stateis s end = { p } .A flow specification may also contain multiple branchesdescribing different ways a system can execute such flow.For example, the flow shown in Figure 2 has three branchescovering the cases where the cache (snoop) operation is hitor miss.
3. POST-SILICON TRACE ANALYSIS3.1 Previous Work
This section recaps the previous approach in [15]. Theobjective of the trace analysis is to reconstruct design in-ternal behavior wrt given system-level flow specifications F from a partially observed silicon trace on a small num-ber of hardware signals. The off-chip analysis includes twobroad phases: (1) trace abstraction, which maps a silicon CPU 0 CPU 1Cache 0 Cache 1PMUGFX BusAudio USB UART Memory
Figure 1: The block diagram of the simple SoCmodel. an audio control unit (Audio), a UART controller (UART),and a USB controller (USB). All these blocks are connectedthrough an interconnect fabric (Bus).System operations are realized by executions performed invarious blocks that are coordinated by system-level proto-cols. These protocols are typically specified in architecturedocuments as message flow diagrams, where the words “pro-tocol” and “flow” are used interchangeably. In this paper, asin [15], system flows are formalized using Labeled Petri-nets(LPNs). Figure 2 shows a memory write protocol initiatedfrom a CPU
CPU_X in LPN where X , } and X = X .An LPN is a tuple ( P, T, E, L, s ) where P is a finite setof places , T is a finite set of transitions , E is a finite setof events , and L : T ! E is a labeling function that mapseach transition t T to an event e E . For each transition t T , its preset, denoted as • t ✓ P , is the set of placesconnected to t , and its postset, denoted as t •✓ P , is the setof places that t is connected to. A state s ✓ P of a LPN is asubset of places marked with tokens. There are two specialstates associated with each LPN; s ✓ P which is the set ofinitially marked places, also referred to as the initial state ,and the end state s end which is the set of places not goingto any transitions.A transition t can be executed in a state s if • t ✓ s .Executing t causes the labeled event to be emitted, and leadsto a new state s = ( s • t ) [ t • . Therefore, executing anLPN leads to a sequence of events. Execution of a LPNcompletes if its s end is reached.For example, in Figure 2, t can be executed in s = { p } . Event ( CPU X : Cache X : wr req ) is emitted after t isexecuted, and the LPN state becomes { p } . The end stateis s end = { p } .A flow specification may also contain multiple branchesdescribing di↵erent ways a system can execute such flow.For example, the flow shown in Figure 2 has three branchescovering the cases where the cache (snoop) operation is hitor miss.
3. POST-SILICON TRACE ANALYSIS3.1 Previous Work
This section recaps the previous approach in [15]. Theobjective of the trace analysis is to reconstruct design in-ternal behavior wrt given system-level flow specifications F from a partially observed silicon trace on a small num-ber of hardware signals. The o↵-chip analysis includes twobroad phases: (1) trace abstraction, which maps a silicon p t : ( CPU X : Cache X : wr req ) p t : ( Cache X : Cache X : snp wr req ) p t : ( Cache X : Cache X : snp wr resp ) p t : ( Cache X : Bus : wr req ) p t : ( Bus : Mem : rd req ) p t : ( Mem : Bus : rd resp ) p t : ( Bus : Cache X : wr resp ) p t : ( Cache X : CPU X : wr resp ) t : ( Cache X : CPU X : wr resp ) t : ( Cache X : CPU X : wr resp ) p Figure 2: LPN formalization of a CPU write proto-col. Each LPN transition is labeled with an event ( src , dest , cmd ) where cmd is a command sent from asource component src to a destination component dest . The places without outgoing edges are termi-nals , which indicate termination of protocols repre-sented by the LPNs. trace into a sequence of flow events , higher-level architec-tural constructs including e.g. , messages, operations, etc,and (2) trace interpretation, which infers possible flow ex-ecution scenarios that are compliant with the abstractedevent sequence.To illustrate the basic idea, consider the system flow inFigure 2, which we call F . Suppose that the following flowexecution trace is abstracted from an observed silicon traceby executing a design that implements F . t t t t t t t t t t t t . . . Here the flow events are referred to by their transition namesin the LPN. The first four events result in the following flowexecution scenario { ( F , , { p } ) , ( F , , { p } ) } . (1)A flow execution scenario is defined as a set of flow instancesand their respective current states after some events are pro-cessed [15]. It can be viewed as an abstraction of systemstates wrt system flows. The above execution scenario in-dicates that the sequence of the first four events is a resultfrom executing those two flow instances of F from their ini-tial states to the shown states. For the first event t , it maybe a result from executing F , or F , , but exactly whichone is unknown due to limited observability. Both possiblecases are considered, and two execution scenarios below are Figure 2: LPN formalization of a CPU write proto-col. Each LPN transition is labeled with an event ( src , dest , cmd ) where cmd is a command sent from asource component src to a destination component dest . The places without outgoing edges are termi-nals , which indicate termination of protocols repre-sented by the LPNs. trace into a sequence of flow events , higher-level architec-tural constructs including e.g. , messages, operations, etc,and (2) trace interpretation, which infers possible flow ex-ecution scenarios that are compliant with the abstractedevent sequence.To illustrate the basic idea, consider the system flow inFigure 2, which we call F . Suppose that the following flowexecution trace is abstracted from an observed silicon traceby executing a design that implements F . t t t t t t t t t t t t . . . Here the flow events are referred to by their transition namesin the LPN. The first four events result in the following flowexecution scenario { ( F , , { p } ) , ( F , , { p } ) } . (1)A flow execution scenario is defined as a set of flow instancesand their respective current states after some events are pro-cessed [15]. It can be viewed as an abstraction of systemstates wrt system flows. The above execution scenario in-dicates that the sequence of the first four events is a resultfrom executing those two flow instances of F from their ini-tial states to the shown states. For the first event t , it maybe a result from executing F , or F , , but exactly whichone is unknown due to limited observability. Both possiblecases are considered, and two execution scenarios below areerived as a result from interpreting t . { ( F , , { p } ) , ( F , , { p } ) }{ ( F , , { p } ) , ( F , , { p } ) } . (2)After handling the next event t , the above two executionscenarios are reduced to the one as shown below. { ( F , , { p } ) , ( F , , { p } ) } . After the remaining six events are handled, the followingexecution scenario is derived. { ( F , , { p } ) , ( F , , { p } ) } As another example, now suppose that the design with abug generates the flow trace below. t t t t t t t t t t t t . . . . This sequence is almost the same as the previous one ex-cept that the last event is t : (Cache_X:CPU_X:rd_resp) instead of t : (Mem:Bus:rd_resp) in the previous trace. t is an event used in a different flow specification describinga CPU memory read protocol. Analyzing the trace rightbefore t leads to the execution scenarios below. { ( F , , { p } ) , ( F , , { p } ) }{ ( F , , { p } ) , ( F , , { p } ) } . (3)However, t cannot be a result from executing either flowinstances in both scenarios, which indicates a noncompli-ance of the design implementation with respect to the givenflow specification. Such an event is referred to as being in-consistent . In this case, the algorithm halts, and returns t and the derived flow execution scenarios as shown in (3) fordebugger to examine further. The trace analysis approach in [15] does not capture order-ings among flow instances for execution scenarios. However,from a debugger’s point of view, communication protocolscan be related. For example, a firmware loading protocolalways happens before a firmware execution protocol. Ifa firmware execution protocol is found to happen before afirmware loading protocol, that possibly indicates an errorin the system implementing such protocols. Such propertiescannot be checked by the previous approach.To address that problem, this paper presents a new defi-nition of flow execution scenarios as { ( F i,j , s i,j , start i,j , end i,j ) | F i ∈ F } where start i,j and end i,j are two indices representing rela-tive time when F i,j is initiated and completed. The orderingrelations can be derived by comparing their start and end indices. For example, for two flow instances in an executionscenario, ( F u,v , s u,v , start u,v , end u,v ) and ( F x,y , s x,y , start x,y , end x,y ), F u,v is initiated before F x,y if start u,v < start x,y , or F x,y is initiated after F u,v is completed if end u,v < start x,y .The ordering relations can provide more accurate informa-tion for understanding system execution under limited ob-servability. Section 3.3 explains how start i,j and end i,j aredecided during the trace analysis.In order to support the new definition of flow executionscenarios, the trace abstraction, which maps an observed sil-icon trace to a linear sequence of flow events as in [15], isalso generalized. A SoC design can be viewed as a group of IP blocks networked by an on-chip interconnect fabric.These blocks communicate with each other through com-munication links, each of which implements a protocol, suchas ARM AXI, over a set of wires. The approach presentedin this paper is communication centric in that it works onsilicon traces on a selected number of wires of a selectednumber of communication links for observation. Supposethat there are n communication links, and some wires fromeach link are selected for observation. A silicon trace is as-sumed to be a sequence of α , α , . . . such that each α i is avector defined as α i = (cid:104) α ,i , . . . α n,i (cid:105) where α k,i is a state on link k in step i .If all wires of a link are observable, then a state on thatlink can be uniquely mapped to a flow event of the same link.Under limited observability, a state on a link is typicallymapped to a set of flow events. Therefore, a silicon trace isabstracted to a sequence (cid:126)E , (cid:126)E , . . . where (cid:126)E i = (cid:104) E ,i , . . . , E n,i (cid:105) (4)is a vector of sets of flow events abstracted from α i , andeach E k,i in (cid:126)E i is a set of flow events abstracted from state α k,i in α i . No temporal orderings exist among all events in (cid:126)E i . On the other hand, for two events, e i ∈ (cid:126)E i and e j ∈ (cid:126)E j such that i < j , then e i happens before e j .Based on different levels of information captured, this pa-per classifies flow execution scenarios as follows. • Type-1 execution scenarios capture the number of in-stances of each flow specification initiated from a sili-con trace, and their relative orderings of initiations. • Type-2 execution scenarios, on top of what is capturedby Type-1 scenarios, capture completion of each flowinstance. This additional information can be used toidentify potential problems if there is any flow instancethat is not completed. Furthermore, Type-2 executionscenarios capture the relative orderings among all flowinstances as described above. • Type-3 execution scenarios, on top of what is capturedby Type-2 scenarios, capture information on executionpaths followed by individual flow instances. This in-formation can provide a means to debuggers to havea detailed examination on how each flow instance isexecuted.These different execution scenarios can be used to providedifferent views of system execution, from coarse-grained tomore detailed ones, at different stages of debug.
Algorithm 1 shows the top-level procedure for detectinginternal flow executions based on a partially observed silicontrace, and checks the compliance wrt a given flow specifica-tion. It takes as inputs F , a set of system level flow spec-ifications, and a signal trace ρ , which is assumed to be asequence of states on a set of observable trace signals, andeach state is uniquely indexed starting from 0.This algorithm scans trace ρ starting from index h initial-ized to 0, extracts all possible flow events from ρ at index h as described in section 3.2 (line 6), and maps each of thoseextracted flow events to update already detected executionscenarios (line 11). The algorithm terminates if one of twoconditions holds. If an inconsistence is encountered, the set lgorithm 1: Check-Compliance( F , ρ ) /* F : a set of flow specification */ /* ρ : a partially observed silicon trace */ M ← {∅} h ← while h ≤ | ρ | do (cid:126)E ← abstract ( ρ, h ) foreach E i of (cid:126)E do inconsistent ← true foreach e ∈ E i do foreach scen ∈ M do Scens ← analysis ( F , scen, e, h ) if Scens (cid:54) = ∅ then inconsistent ← false M ← ( M − scen ) ∪ Scens if inconsistent = true then return ( M , h, i ) h ← h + 1 return ( M , − , − h and i are returned (line 16). Index h provides tem-poral information on when the inconsistency occurs, while i provides spatial information on which communication linkan inconsistent event is transmitted. If no inconsistency isfound, the set of all execution scenarios compliant with theobserved trace is returned (line 18) when index h is largerthan the length of the trace.Algorithm 2 takes the specification F , an execution sce-nario scen , a flow event e , and index h of the trace where e is extracted, and it produces a set of execution scenarios R consistent with e . This algorithm performs two tasks. Inthe first task (lines 5-12), the algorithm checks every flowinstance to decide if e can be accepted. If such an instanceis found (line 7), then it is updated with the new state asthe result of e (line 9). Furthermore, if e causes the flow in-stance to complete, its index end i,j is set to h (line 10-11),indicating the completion of that instance due to event e atstep h of the trace. In task 2, all possibilities where e caninitiate a new flow instance are considered (line 14-20). Ifa new instance can be initiated, its start i,j is set to h , indi-cating the initiation of that instance due to a signal eventat step h of the trace. Due to the limited observability, reconstructing systemlevel executions from an observed silicon trace is an impre-cise process. The large number of execution scenarios typi-cally derived during the analysis would take large amountsof runtime and memory to process and to store, thus makingit less efficient. This is referred to as the complexity prob-lem of the trace analysis. After the analysis is done, a largenumber of derived execution scenarios make it difficult tounderstand the analysis results, thus being less helpful fordebugging. Obviously, a single flow execution scenario de-rived at the end of the trace analysis provides much moreprecise information for debug than ten candidate flow exe-cution scenarios. This is referred to as the accuracy problemof the trace analysis.
Algorithm 2:
Analysis ( F , scen, e, h ) /* scen = { ( F i,j , s i,j , start i,j , end i,j ) } */ /* e is a flow event abstracted from silicon trace atindex h */ R = ∅ /* Check if e can change state any existing flowinstances of scen */ foreach ( F i,j , s i,j , start i,j , end i,j ) ∈ scen do s (cid:48) i,j ← accept ( F i,j , s i,j , e ) if s (cid:48) i,j (cid:54) = ∅ then Let scen (cid:48) be a copy of scen Replace s i,j of scen (cid:48) with s (cid:48) i,j if s (cid:48) i,j = F i .s end then Update end i,j of scen (cid:48) with h R ← R ∪ scen (cid:48) /* Check if e can extend scen by initiating new flowinstances */ foreach F i ∈ F do create a new instance F i,h s (cid:48) i,h ← accept ( F i,h , F i .s , e ) if s (cid:48) i,h (cid:54) = ∅ then Let scen (cid:48) be a copy of scen scen (cid:48) ← scen (cid:48) ∪ ( F i,h , s (cid:48) i,h , h, − R ← R ∪ scen (cid:48) return R The contributing factors to the complexity and accuracyproblems are explained below.1.
A signal event mapped to a set of flow events − Due tothe limited observability, a signal event of an observedsilicon trace is often interpreted as a number of differ-ent flow events, which typically leads to derivation ofa number of different execution scenarios. This situa-tion is exacerbated by the fact that silicon traces areoften very long, which could lead to excessively largenumbers of possible execution scenarios derived duringor at the end of the analysis.2.
A flow event mapped to different temporal flow in-stances − Temporal flow instances refer to the flow in-stances activated by the same component, e.g. read/writeflows activated by
CPU_0 . If several temporal instancesof some flows are activated by a component, mappingflow events to those flow instances can be ambigu-ous. For example, suppose that an execution scenarioincludes two instances of the flow as shown in Fig-ure 2 activated by
CPU_0 , one in state { p } , and theother one in state { p } . An instance of flow event( Cache 0 : CPU 0 : wr resp ) can be mapped to eitherflow instance leading to two new execution scenariosfrom the current one.3. A flow event mapped to flow instances activated by dif-ferent components − This situation can happen whenflow instances that share some common events are ac-tivated by different components. For example, supposean execution scenario has two instances of the flow asshown in Figure 2, one activated by
CPU_0 and theother one by
CPU_1 , and both are in state { p } . Aflow event ( Mem : Bus : rd resp ) can be mapped to ei-ther one of these two instances, leading to two newxecution scenarios derived from the current one.The above issues can be mitigated by good signal selec-tions to be discussed in the following section. In order toevaluate the impacts of different trace signal selections onthe complexity and accuracy of the trace analysis, this pa-per introduces two quantitative metrics. The complexity ismeasured by the peak count of flow execution scenarios en-countered during the analysis process, i.e. , the largest size of M encountered during the execution of Algorithm 1. Theaccuracy is measured by the final count of flow executionscenarios derived at the end of the analysis process, i.e. , thesize of M returned on either line 16 or 18 of Algorithm 1.
4. TRACE SIGNAL SELECTION
Trace signal selection is a critical step in post-silicon de-bug. It includes two different efforts: pre-silicon and post-silicon. During pre-silicon selection, a few thousand signalsamong a vast number of internal signals are tapped for obser-vation. All necessary signals must be selected at this stage,otherwise, expensive re-design along with silicon re-spin arerequired. During post-silicon debug, a small subset of thosetapped signals are routed to the chip interface for tracingduring system execution.Previous work such as [4] is typically applied to gate leveldesign models, and the quality of the results is evaluated bythe commonly used state restoration ratio. However, it isdifficult to scale those methods to large and complex SoCdesigns. More importantly, signals selected at the gate levelare often irrelevant to system-level functionalities. Thereis an attempt to raise the abstraction level for trace signalselection to the register transfer level (RTL) guided by asser-tions [11], however that work does not consider system levelfunctionalities either. In [3], a system level protocol guidedapproach is proposed. It is similar to our work in that bothare based on system level protocols. However, the selec-tion techniques developed in [3] are simple and irrelevantto understanding silicon traces at the system level, and theevaluation was performed on an abstract transaction levelmodel.This section introduces a framework shown in Figure 3 fortrace signal selection guided by system-level protocols. Dueto the page limit, this paper only considers the pre-silicontrace signal selection. Since the pre-silicon selection needsto support all types of execution scenarios, it is sufficient toconsider only Type-3 scenarios as they supersede Type-1 or-2 scenarios.
During the system level selection, different subsets of flowevents for observation are selected from given flow specifica-tions. Then, those results are passed to the more refined bitlevel selection.To support Type-3 scenarios, the start and end events ofall flow specifications must be selected. If a flow specifica-tion has multiple branches, additional events may need tobe selected so that the branch followed by a flow instanceduring system execution can be captured. Figure 4 showstwo examples of different branching structures for flows. • In Figure 4(a), each branch ends with an unique event.There is no need to select additional events as observ-ing different end events can clearly identify the branchfollowed during system execution. execution scenarios derived from the current one.The above issues can be mitigated by good signal selec-tions to be discussed in the following section. In order toevaluate the impacts of di↵erent trace signal selections onthe complexity and accuracy of the trace analysis, this pa-per introduces two quantitative metrics. The complexity ismeasured by the peak count of flow execution scenarios en-countered during the analysis process, i.e. , the largest size of M encountered during the execution of Algorithm 1. Theaccuracy is measured by the final count of flow executionscenarios derived at the end of the analysis process, i.e. , thesize of M returned on either line 16 or 18 of Algorithm 1.
4. TRACE SIGNAL SELECTION
Trace signal selection is a critical step in post-silicon de-bug. It includes two di↵erent e↵orts: pre-silicon and post-silicon. During pre-silicon selection, a few thousand signalsamong a vast number of internal signals are tapped for obser-vation. All necessary signals must be selected at this stage,otherwise, expensive re-design along with silicon re-spin arerequired. During post-silicon debug, a small subset of thosetapped signals are routed to the chip interface for tracingduring system execution.Previous work such as [4] is typically applied to gate leveldesign models, and the quality of the results is evaluated bythe commonly used state restoration ratio. However, it isdi cult to scale those methods to large and complex SoCdesigns. More importantly, signals selected at the gate levelare often irrelevant to system-level functionalities. Thereis an attempt to raise the abstraction level for trace signalselection to the register transfer level (RTL) guided by asser-tions [11], however that work does not consider system levelfunctionalities either. In [3], a system level protocol guidedapproach is proposed. It is similar to our work in that bothare based on system level protocols. However, the selec-tion techniques developed in [3] are simple and irrelevantto understanding silicon traces at the system level, and theevaluation was performed on an abstract transaction levelmodel.This section introduces a framework shown in Figure 3 fortrace signal selection guided by system-level protocols. Dueto the page limit, this paper only considers the pre-silicontrace signal selection. Since the pre-silicon selection needsto support all types of execution scenarios, it is su cient toconsider only Type-3 scenarios as they supersede Type-1 or-2 scenarios.
During the system level selection, di↵erent subsets of flowevents for observation are selected from given flow specifica-tions. Then, those results are passed to the more refined bitlevel selection.To support Type-3 scenarios, the start and end events ofall flow specifications must be selected. If a flow specifica-tion has multiple branches, additional events may need tobe selected so that the branch followed by a flow instanceduring system execution can be captured. Figure 4 showstwo examples of di↵erent branching structures for flows. • In Figure 4(a), each branch ends with an unique event.There is no need to select additional events as observ-ing di↵erent end events can clearly identify the branchfollowed during system execution.
Flow Specification F System-Level Selection
Sets of Flow
Events
RTL Model
Bit-Level Selection
Trace Signals
Figure 3: A framework of trace signal selection. (a) (b)
Figure 4: Examples of flow structures. • In Figure 4(b), branches split and then join, and theflow ends with a common event. In this case, an uniqueevent needs to be selected for each branch.The flow shown in Figure 2 has three branches with astructure similar to Figure 4(b). Its start and end eventsare { t , t , t , t } . Note that t , t , and t actually refer tothe same event. There is no choice for the right branch as t must be selected. To identify the left two branches fromthe right one, either t or t needs to be selected. Similarly,one of events in { t , t , t , t } needs to be selected in orderto identify the left branch from the middle one. Therefore,all possible event selections for that flow are { t , t , t , t }⇥{ t , t }⇥{ t , t , t , t } . Among possibly large number of event selections, thereare two types of events that have interesting characteris-tics. One type includes events that are unique to specificflows (ref. unique events), while the other type includesevents shared by multiple flows (ref. shared events). Thissection considers their impacts on the complexity and ac-curacy of the trace analysis and the signal selection. Forthe complexity and accuracy of the trace analysis, only is-sues t and t are used only in that flow. During the trace analysis,they are just mapped to the instances of that particularflow. Issue Figure 3: A framework of trace signal selection. execution scenarios derived from the current one.The above issues can be mitigated by good signal selec-tions to be discussed in the following section. In order toevaluate the impacts of di↵erent trace signal selections onthe complexity and accuracy of the trace analysis, this pa-per introduces two quantitative metrics. The complexity ismeasured by the peak count of flow execution scenarios en-countered during the analysis process, i.e. , the largest size of M encountered during the execution of Algorithm 1. Theaccuracy is measured by the final count of flow executionscenarios derived at the end of the analysis process, i.e. , thesize of M returned on either line 16 or 18 of Algorithm 1.
4. TRACE SIGNAL SELECTION
Trace signal selection is a critical step in post-silicon de-bug. It includes two di↵erent e↵orts: pre-silicon and post-silicon. During pre-silicon selection, a few thousand signalsamong a vast number of internal signals are tapped for obser-vation. All necessary signals must be selected at this stage,otherwise, expensive re-design along with silicon re-spin arerequired. During post-silicon debug, a small subset of thosetapped signals are routed to the chip interface for tracingduring system execution.Previous work such as [4] is typically applied to gate leveldesign models, and the quality of the results is evaluated bythe commonly used state restoration ratio. However, it isdi cult to scale those methods to large and complex SoCdesigns. More importantly, signals selected at the gate levelare often irrelevant to system-level functionalities. Thereis an attempt to raise the abstraction level for trace signalselection to the register transfer level (RTL) guided by asser-tions [11], however that work does not consider system levelfunctionalities either. In [3], a system level protocol guidedapproach is proposed. It is similar to our work in that bothare based on system level protocols. However, the selec-tion techniques developed in [3] are simple and irrelevantto understanding silicon traces at the system level, and theevaluation was performed on an abstract transaction levelmodel.This section introduces a framework shown in Figure 3 fortrace signal selection guided by system-level protocols. Dueto the page limit, this paper only considers the pre-silicontrace signal selection. Since the pre-silicon selection needsto support all types of execution scenarios, it is su cient toconsider only Type-3 scenarios as they supersede Type-1 or-2 scenarios.
During the system level selection, di↵erent subsets of flowevents for observation are selected from given flow specifica-tions. Then, those results are passed to the more refined bitlevel selection.To support Type-3 scenarios, the start and end events ofall flow specifications must be selected. If a flow specifica-tion has multiple branches, additional events may need tobe selected so that the branch followed by a flow instanceduring system execution can be captured. Figure 4 showstwo examples of di↵erent branching structures for flows. • In Figure 4(a), each branch ends with an unique event.There is no need to select additional events as observ-ing di↵erent end events can clearly identify the branchfollowed during system execution.
Flow Specification F System-Level Selection
Sets of Flow
Events
RTL Model
Bit-Level Selection
Trace Signals
Figure 3: A framework of trace signal selection. (a) (b)
Figure 4: Examples of flow structures. • In Figure 4(b), branches split and then join, and theflow ends with a common event. In this case, an uniqueevent needs to be selected for each branch.The flow shown in Figure 2 has three branches with astructure similar to Figure 4(b). Its start and end eventsare { t , t , t , t } . Note that t , t , and t actually refer tothe same event. There is no choice for the right branch as t must be selected. To identify the left two branches fromthe right one, either t or t needs to be selected. Similarly,one of events in { t , t , t , t } needs to be selected in orderto identify the left branch from the middle one. Therefore,all possible event selections for that flow are { t , t , t , t }⇥{ t , t }⇥{ t , t , t , t } . Among possibly large number of event selections, thereare two types of events that have interesting characteris-tics. One type includes events that are unique to specificflows (ref. unique events), while the other type includesevents shared by multiple flows (ref. shared events). Thissection considers their impacts on the complexity and ac-curacy of the trace analysis and the signal selection. Forthe complexity and accuracy of the trace analysis, only is-sues t and t are used only in that flow. During the trace analysis,they are just mapped to the instances of that particularflow. Issue execution scenarios derived from the current one.The above issues can be mitigated by good signal selec-tions to be discussed in the following section. In order toevaluate the impacts of di↵erent trace signal selections onthe complexity and accuracy of the trace analysis, this pa-per introduces two quantitative metrics. The complexity ismeasured by the peak count of flow execution scenarios en-countered during the analysis process, i.e. , the largest size of M encountered during the execution of Algorithm 1. Theaccuracy is measured by the final count of flow executionscenarios derived at the end of the analysis process, i.e. , thesize of M returned on either line 16 or 18 of Algorithm 1.
4. TRACE SIGNAL SELECTION
Trace signal selection is a critical step in post-silicon de-bug. It includes two di↵erent e↵orts: pre-silicon and post-silicon. During pre-silicon selection, a few thousand signalsamong a vast number of internal signals are tapped for obser-vation. All necessary signals must be selected at this stage,otherwise, expensive re-design along with silicon re-spin arerequired. During post-silicon debug, a small subset of thosetapped signals are routed to the chip interface for tracingduring system execution.Previous work such as [4] is typically applied to gate leveldesign models, and the quality of the results is evaluated bythe commonly used state restoration ratio. However, it isdi cult to scale those methods to large and complex SoCdesigns. More importantly, signals selected at the gate levelare often irrelevant to system-level functionalities. Thereis an attempt to raise the abstraction level for trace signalselection to the register transfer level (RTL) guided by asser-tions [11], however that work does not consider system levelfunctionalities either. In [3], a system level protocol guidedapproach is proposed. It is similar to our work in that bothare based on system level protocols. However, the selec-tion techniques developed in [3] are simple and irrelevantto understanding silicon traces at the system level, and theevaluation was performed on an abstract transaction levelmodel.This section introduces a framework shown in Figure 3 fortrace signal selection guided by system-level protocols. Dueto the page limit, this paper only considers the pre-silicontrace signal selection. Since the pre-silicon selection needsto support all types of execution scenarios, it is su cient toconsider only Type-3 scenarios as they supersede Type-1 or-2 scenarios.
During the system level selection, di↵erent subsets of flowevents for observation are selected from given flow specifica-tions. Then, those results are passed to the more refined bitlevel selection.To support Type-3 scenarios, the start and end events ofall flow specifications must be selected. If a flow specifica-tion has multiple branches, additional events may need tobe selected so that the branch followed by a flow instanceduring system execution can be captured. Figure 4 showstwo examples of di↵erent branching structures for flows. • In Figure 4(a), each branch ends with an unique event.There is no need to select additional events as observ-ing di↵erent end events can clearly identify the branchfollowed during system execution.
Flow Specification F System-Level Selection
Sets of Flow
Events
RTL Model
Bit-Level Selection
Trace Signals
Figure 3: A framework of trace signal selection. (a) (b)
Figure 4: Examples of flow structures. • In Figure 4(b), branches split and then join, and theflow ends with a common event. In this case, an uniqueevent needs to be selected for each branch.The flow shown in Figure 2 has three branches with astructure similar to Figure 4(b). Its start and end eventsare { t , t , t , t } . Note that t , t , and t actually refer tothe same event. There is no choice for the right branch as t must be selected. To identify the left two branches fromthe right one, either t or t needs to be selected. Similarly,one of events in { t , t , t , t } needs to be selected in orderto identify the left branch from the middle one. Therefore,all possible event selections for that flow are { t , t , t , t }⇥{ t , t }⇥{ t , t , t , t } . Among possibly large number of event selections, thereare two types of events that have interesting characteris-tics. One type includes events that are unique to specificflows (ref. unique events), while the other type includesevents shared by multiple flows (ref. shared events). Thissection considers their impacts on the complexity and ac-curacy of the trace analysis and the signal selection. Forthe complexity and accuracy of the trace analysis, only is-sues t and t are used only in that flow. During the trace analysis,they are just mapped to the instances of that particularflow. Issue (a) (b) Figure 4: Examples of flow structures. • In Figure 4(b), branches split and then join, and theflow ends with a common event. In this case, an uniqueevent needs to be selected for each branch.The flow shown in Figure 2 has three branches with astructure similar to Figure 4(b). Its start and end eventsare { t , t , t , t } . Note that t , t , and t actually refer tothe same event. There is no choice for the right branch as t must be selected. To identify the left two branches fromthe right one, either t or t needs to be selected. Similarly,one of events in { t , t , t , t } needs to be selected in orderto identify the left branch from the middle one. Therefore,all possible event selections for that flow are { t , t , t , t } × { t , t } × { t , t , t , t } . Among possibly large number of event selections, thereare two types of events that have interesting characteris-tics. One type includes events that are unique to specificflows (ref. unique events), while the other type includesevents shared by multiple flows (ref. shared events). Thissection considers their impacts on the complexity and ac-curacy of the trace analysis and the signal selection. Forthe complexity and accuracy of the trace analysis, only is-sues t and t are used only in that flow. During the trace analysis,they are just mapped to the instances of that particularflow. Issue t and t have a similar characteristic.On the other hand, events t and t are used in manydifferent flows of different components. Those flows can beread/write flows of CPU_0 or CPU_1 . During the trace anal-ysis, if there are multiple instances of such flows, it is im-possible to know which of those flow instances cause thoseevents to be generated. Therefore, the analysis algorithmhas to map those events to those flow instances in all pos-sible ways. That can cause huge negative impacts on thecomplexity and accuracy of the trace analysis.In terms of trace signal selection, those two types of eventscan lead to different results. If unique events are selected.then the total number of events selected can be large, and asa result, a large number of trace signals need to be selected inorder to observe those events. On the other hand, the totalnumber of events can be smaller if shared events are selected.That leads to a smaller number of trace signals that need tobe selected. The negative impacts of selecting shared eventscan also be mitigated if certain implementation details areavailable. Next section gives more discussions on that point.
The bit level selection takes as inputs the set of event selec-tions produced in the previous step and an RTL model thatimplements the system flow specifications, and performs twotasks for each event selection:1. Evaluate its quality wrt the three issues discussed inSection 3.4;2. Choose one selection, and generate a set of candidatetrace signals that implement the selected events.The ultimate goal of the bit level selection is to produce areduced set candidate trace signals optimized for the traceanalysis approach. Since the bit level selection depends onimplementation specifics, this section can only discuss somegeneral guidelines and tradeoffs. Note that flow specifica-tions are typically independent of memory address and datainformation. Therefore, the address and data bits includedin event implementations can be generally ignored.Signals that implement the
Cmd field of flow events are se-lected based on their respective distinguishing power . Givena set of flow events E and a set of signals W that implement E , the distinguishing power of W i ⊆ W , is defined by E canbe partitioned wrt W i . A finer partition means higher dis-tinguishing power. For example, suppose two flow events onlink ( cpu0 , DCache0 ) implemented by eight signals b . . . b with the following encodings.( cpu0 : DCache0 : wr req ) 0100 0000( cpu0 : DCache0 : rd req ) 1000 0000Under these encodings, signals b . . . b have zero distin-guishing power. b and b have the equal power, thereforeselecting either one would be fine. Selecting signals with high distinguishing power helps to address issue t or t are selected, observingtags is not needed.2. Shared events t or t are selected along with tags.For option 2, tags can help to map events to the flow in-stances with the same tags during the trace analysis, thusaddressing issue p and p . When a flow instance reaches p , which branch to take next depends on whether the cacheoperation is hit or miss. Similarly, which branch to take at p depends on whether the cache snoop operation is hit ormiss. If these two status signals are available and includedfor observation, there is no need to select branch events. Ob-serving start/end events plus those status signals are suffi-cient to identify branches followed by a flow instance duringsystem execution.
5. EXPERIMENTAL RESULTS
To the best of our knowledge, this work is the first topresent a systematic approach to post-silicon trace analy-sis guided by system level protocols. We are not able tofind any similar previous work where ours can be evaluatedand compared with. The closest work to ours is [15]. How-ever, our work is more general and developed with practicalconsiderations. Additionally, the work in [15] is discussedand evaluated based on an abstract transaction-level modelwhile our approach is evaluated on a RTL model.
The ideas and techniques presented in this paper are eval-uated on a multi-core SoC prototype, as shown in Figure 1,which implements a number of common industrial system-level protocols including cache coherence and power man-agement. This prototype is a cycle- and pin-accurate RTLmodel written in VHDL. Even though this model is simplecompared to real SoC designs, it is much more sophisticatedthan the gate-level benchmark suites typically considered astargets for post-silicon analysis [10, 4, 5].Since the proposed trace analysis approach is communica-tion centric, the focus of this model is the implementation ofsystem-level protocols. The CPUs are treated as a test envi-ronment where software programs are simulated in VHDL totrigger various protocols. Therefore, there is no instructionache as no instructions are involved when the CPUs aresimulated. The peripheral blocks, GFX, PMU, Audio, etc ,are also described as abstract models that generate eventsto initiate flows or to respond incoming requests.More details of some system-level protocols implementedin our model can be found in [3]. They include downstreamread/write protocols for each CPU, upstream read/write forthe peripheral blocks, and system power management pro-tocols, which are abstracted from real industrial protocols.These system-level protocols are supported by inter-blockcommunication protocols based on the ARM AXI4-lite [1].A total of 16 flows are implemented for this prototype.A flow event is generated from a source and consumed bya destination by messages transmitted over that link. In ourmodel, each message is organized as follows. (cid:104)
Val ( ) , Cmd ( ) , Tag ( ) , Sid ( ) , Addr ( ) , Data ( ) (cid:105) The meanings of the message fields are given below. Thenumbers following the individual fields indicate their respec-tive widths. Note that not all fields are used on all links.That model has over four thousand single bit signals.
Val indicates validity of a message.
Cmd carries operations to be performed by the target block.
Tag is used by
Bus to identify the original sources of mes-sages from different blocks that go to the same desti-nation, e.g. memory wr_req from
Bus in response to wr_req from both CPUs.
Sid is an unique number generated by a component to rep-resent sequencing information of flows initiated by thesame component.
Addr carries the memory address at the target block where
Cmd is applied.
Data carries data to a target or from a source. Its width canvary depending on the links where a message is sent.On the links between
Cache to Bus , the width is equalto the size of the cache block, which is 64 bytes. Forall the other links, the width is 32 bits.
Test Environment
The prototype is simulated in a ran-dom test environment where CPUs, GFX, and other pe-ripheral blocks are programmed to randomly select a flowto initiate in each clock cycle. The contents of
Cmd , Addr ,and
Data in each activated flow are set randomly. Addi-tionally, CPUs can activate power management protocolsnon-deterministically. Each of these blocks activates a totalof 100 flow instances during entire simulation.
Trace Signal Selection
In the experiments, differentselections of trace signals are produced as discussed in sec-tion 4, and their impacts on the complexity and accuracyof the trace analysis approach are evaluated. The list belowexplains the selections at the system level while informationon the bit level selection is given in Table 1.S1 All events of all flow specifications, and all signals im-plementing each event are selected. This selection of-fers full observation, and provides a baseline for com-paring with other selections.S2 The start and end events of all protocols are selected.Furthermore, for each branch in each flow, one uniqueevent is selected. S3 The start and end events of all protocols are selected.Furthermore, for each branch in each flow, a highlyshared event is selected.S4 The start and end events of all protocols are selected.Instead of selecting events for branches in each flow,signals whose states control the flow branching are se-lected.At the bit level, the
Addr and
Data fields are not consid-ered. On the other hand, the
Val bit is always selected sothat valid messages can be identified from observed traces.For selections S2, S3, and S4, experiments are performed toevaluate all combinations of
Cmd , Tag and
Sid fields.
In Table 1, a means that all signals implementing a par-ticular field for all selected events in selection S X are traced.Otherwise, all those signals are not traced. Third row ( Cmd or Sid has severe impacts on the trace analysis as explainedin issues
Tag has negative impacts, but not as severe.The trace analysis can still finish even though it takes moretime and memory. Next, compare the results obtained byselecting
Cmd and
Sid but no
Tag under S2 − S4. The resultswith S4 are much better than S2 or S3. This is due to thatno branch events are selected for S4, therefore, issues
Cmd , Tag and
Sid areapplied to all events as the result of the system-level selec-tion. A finer selection can be used to reduce trace signals ifunique events and shared events are considered separately.For unique events, the sources where they are generated areknown from flows, therefore
Tag s need not be traced. Sharedevents may be results of flow instances initiated by differentcomponents, therefore tracing
Tag s are necessary. On theother hand, tracing or not tracing
Cmd s has little impact onthe trace analysis. These points are supported by the re-sults shown in columns under “U S (cid:48)(cid:48) . Under S2, comparethe results under “U S (cid:48)(cid:48) against those with all three fieldsselected. We can see that the runtime performance and thecomplexity and accuracy of the trace analysis are similarwhile the trace signals are reduced with the finer selection.Comparing the results under “U S (cid:48)(cid:48) against those obtainedwith only
Cmd and
Sid selected, the complexity is signifi-cantly dropped. The same conclusion can be drawn for S3and S4.From the above discussion, it is necessary to trace signalsimplementing
Cmd and
Sid whenever possible, and trace asmany signals implementing
Tag as allowed to reduce com-plexity of the trace analysis even more. If
Tag or Sid is not able 1: Runtime Results of Trace analysis with different trace signal selections. Runtime is in seconds andmemory usage is in MB. − indicates the results are not available due to the 10 minute time limit exceeded. Systemlevelselection S1 S2 S3 S4U S U S U S
CmdTagSid − − − − − − > M k > M K > M > M > GB > GB > > part of the design, we recommend to add DFx circuitry inorder to trace such information. In the above experiments,the final execution scenarios under different signal selection,if available, contain the correct number of flow instancesinitiated, and the orderings among the flow instances, asgenerated by the test environment, are correctly captured.
6. RELATED WORK
Our work is closely related to communication-centric andtransaction based debug. An early pioneering work is de-scribed by Goossens et al. [9, 14, 8], which advocates thefocus on observing activities on the interconnect networkamong IP blocks, and mapping these activities to transac-tions for better correlation between computations and com-munications. A similar transaction-based debug approach ispresented by Gharebhagi and Fujita [7]. It proposes an auto-mated extraction of state machines at transaction level fromhigh level design models. From an observed failure trace, ittries to derive a set of feasible transaction traces that lead tothe observed failure state. However, this approach requiresmanual inputs and may not be able to derive such traces.Singerman et al. [13] deploys a central repository of systemevents and simple transactions defined by architects and IPdesigners. It spans across a wide spectrum of the post-siliconvalidation including DFx instrumentation, test generation,coverage, and debug. Also, Abarbanel et al. [2] propose amodel at a higher-level of abstraction, flows , is proposed.Flows are used to specify more sophisticated cross-IP trans-actions such as power management, security, etc, and tofacilitate reuse of the efforts of the architectural analysis tocheck HW/SW implementations.
7. CONCLUSION
An improved trace analysis approach for post-silicon de-bug is presented where observed raw silicon traces are in-terpreted wrt system flow specifications. In this approach,a new formulation of flow execution scenarios is describedwhere more diverse information among flows can be cap-tured and represented. A trace signal selection frameworkis also described in support of the proposed trace analysisapproach. Some observations on trace signal selections andtheir impacts on the accuracy and efficiency of the traceanalysis are discussed. Experiments on a non-trivial SoC prototype reveal insights on impacts of different signal se-lections on the complexity and accuracy of the trace anal-ysis. In the future, we plan to perform more extensive andin-depth study on trace signal selections guided by systemflow specifications.
8. REFERENCES
Proceedings of DAC’14 ,pages 2:1–2:4, 2014.[3] M. Amrein. System-level trace signal selection forpost-silicon debug using linear programming. Master’sthesis, Univ. of Illinois Urbana-Champaign, May 2015.[4] K. Basu and P. Mishra. Efficient trace signal selectionfor post silicon validation and debug. In
VLSI Design(VLSI Design) , pages 352–357. IEEE, 2011.[5] D. Chatterjee, C. McCarter, and V. Bertacco.Simulation-based signal selection for state restorationin silicon debug. In
ICCAD , pages 595–601.
IEEE, 2011.[6] H. D. Foster. Trends in functional verification: A 2014industry study. In
DAC , pages 48:1–48:6, 2015.[7] A. M. Gharehbaghi and M. Fujita. Transaction-basedpost-silicon debug of many-core system-on-chips. In
ISQED , pages 702–708, 2012.[8] K. Goossens, B. Vermeulen, and A. B. Nejad. A high-leveldebug environment for communication-centric debug. In
Proceedings of DATE’09 , pages 202–207, 2009.[9] K. Goossens, B. Vermeulen, R. v. Steeden, andM. Bennebroek. Transaction-based communication-centricdebug. In
Proceedings of NOCS’07 , pages 95–106, 2007.[10] H. F. Ko and N. Nicolici. Algorithms for state restorationand trace-signal selection for data acquisition in silicondebug.
IEEE TCAD , 28(2):285–297, 2009.[11] S. Ma, D. Pal, R. Jiang, S. Ray, and S. Vasudevan. Can’tsee the forest for the trees: State restoration’s limitationsin post-silicon trace signal selection. ICCAD ’15, pages 1–8,Piscataway, NJ, USA, 2015. IEEE Press.[12] P. Patra. On the cusp of a validation wall.
IEEE Des. Test ,24(2):193–196, Mar. 2007.[13] E. Singerman, Y. Abarbanel, and S. Baartmans.Transaction based pre-to-post silicon validation. In
Proceedings of DAC’11 , pages 564–568, 2011.[14] B. Vermeulen and K. Goossens. A noc monitoringinfrastructure for communication-centric debug ofmbedded multi-processor socs. In
VLSI-DAT ’09 , pages183–186, 2009.[15] H. Zheng, Y. Cao, S. Ray, and J. Yang. Protocol-guidedanalysis of post-silicon traces under limited observability.In
Proceedings of ISQED’16 , pages 301–306, March 2016.
PPENDIXA. CPU READ/WRITE DOWNSTREAM PRO-TOCOL X= {
0, 1 } X’=1-XTarget= { Memory, USB, UART, AUDIO, GFX } CMD = { read, write } APPENDIXA. CPU READ/WRITE DOWNSTREAM PRO-TOCOL X= {
0, 1 } X’=1-XTarget= { Memory, USB, UART, AUDIO, GFX } CMD = { read, write } p t : ( CPU X : Cache X : CMD req ) p t : ( Cache X : Cache X : snp CMD req ) p t : ( Cache X : Cache X : snp CMD resp ) p t : ( Cache X : Bus : CMD req ) p t : ( Bus : Target : CMD req ) p t : ( Target : Bus : CMD resp ) p t : ( Bus : Cache X : CMD resp ) p t : ( Cache X : CPU X : CMD resp ) t : ( Cache X : CPU X : CMD resp ) t : ( Cache X : CPU X : CMD resp ) p Figure 5: LPN formalization of a CPU read/writeprotocol. Figure 6: MSQ of CPU downstream read/write pro-tocol
Figure 5: LPN formalization of a CPU read/writeprotocol. Figure 6: MSQ of CPU downstream read/write pro-tocol . UPSTREAM READ/WRITE PROTOCOL
Initiator= { GFX, USB, AUDIO, UART } Target= { Memory, USB, UART, AUDIO, GFX } Note that a peripheral can’t initialize a read flow to readitself
B. UPSTREAM READ/WRITE PROTOCOL
Initiator= { GFX, USB, AUDIO, UART } Target= { Memory, USB, UART, AUDIO, GFX } Note that a peripheral can’t initialize a read flow to readitself p t : ( Initiator : Bus : rd req ) p t : ( Bus : Cache 0 : snp req ) p t : ( Cache 0 : Cache 1 : snp req ) p t : ( Cache 1 : Cache 0 : snp resp ) p t : ( Cache 0 : Bus : snp resp ) p t : ( Bus : Target : rd / wt req ) p t : ( Target : Bus : rd / wt resp ) p t : ( Bus : I : rd resp ) t : ( Cache 0 : Bus : snp resp ) t : ( Bus : Initiator : rd resp ) p Figure 7: LPN formalization of a upstream read pro-tocol. Figure 8: MSQ of upstream read protocol
Figure 7: LPN formalization of a upstream read pro-tocol. Figure 8: MSQ of upstream read protocol nitiator= { GFX, AUDIO } Target= { Memory, USB, UART, AUDIO, GFX } Note that a peripheral can’t initialize a write flow to writeitself
Initiator= { GFX, AUDIO } Target= { Memory, USB, UART, AUDIO, GFX } Note that a peripheral can’t initialize a write flow to writeitself p t : ( Initiator : Bus : wr req ) p t : ( Bus : Cache 0 : snp wr req ) p t : ( Cache 0 : Cache 1 : snp wr req ) p t : ( Cache 1 : Cache 0 : snp wr resp ) p t : ( Cache 0 : Bus : snp wr resp ) p t : ( Bus : Target : wr req ) p t : ( Target : Bus : wr resp ) p t : ( Bus : Initiator : wr resp ) t : ( Bus : Initiator : wr resp ) t : ( Cache 0 : Bus : snp wr resp ) p Figure 9: LPN formalization of a upstream writeprotocol. Figure 10: MSQ of upstream write protocol
Figure 9: LPN formalization of a upstream writeprotocol. Figure 10: MSQ of upstream write protocol . CPU WRITE BACK PROTOCOLD. CPU POWER ON/OFF PROTOCOL X= { } Target= { Memory, USB, UART, AUDIO, GFX } C. CPU WRITE BACK PROTOCOL X= { } Target= { Memory, USB, UART, AUDIO, GFX } p t : ( Cache X : Bus : wb req ) p t : ( Bus : Target : wb req ) p t : ( Target : Bus : wb resp ) p Figure 11: LPN formalization of a CPU write backprotocol.Figure 12: MSQ of power write back protocol
D. CPU POWER ON/OFF PROTOCOL
CMD= { pwr on, pwr o↵ } X= { } Target= { Memory, USB, UART, AUDIO, GFX } p t : ( CPU X : Cache X : CMD req ) p t : ( Cache X : Bus : CMD req ) p t : ( Bus : PWR : CMD req ) p t : ( PWR : Target : CMD req ) p t : ( Target : PWR : CMD resq ) p t : ( PWR : Bus : CMD resq ) p t : ( Bus : Cache X : CMD resq ) p t : ( Cache X : CPU X : CMD resq ) p Figure 13: LPN formalization of a CPU poweron/o↵ protocol.Figure 14: MSQ of power on/o↵ protocol
Figure 11: LPN formalization of a CPU write backprotocol.Figure 12: MSQ of power write back protocol
MD= { pwr on, pwr off } X= { } Target= { Memory, USB, UART, AUDIO, GFX } C. CPU WRITE BACK PROTOCOL X= { } Target= { Memory, USB, UART, AUDIO, GFX } p t : ( Cache X : Bus : wb req ) p t : ( Bus : Target : wb req ) p t : ( Target : Bus : wb resp ) p Figure 11: LPN formalization of a CPU write backprotocol.Figure 12: MSQ of power write back protocol
D. CPU POWER ON/OFF PROTOCOL