[PDF] Boosting the Bounds of Symbolic QED for Effective Pre-Silicon Verification of Processor Cores

Abstract

Existing techniques to ensure functional correctness and hardware trust during pre-silicon verification face severe limitations. In this work, we systematically leverage two key ideas: 1) Symbolic Quick Error Detection (Symbolic QED or SQED), a recent bug detection and localization technique using Bounded Model Checking (BMC); and 2) Symbolic starting states, to present a method that: i) Effectively detects both "difficult" logic bugs and Hardware Trojans, even with long activation sequences where traditional BMC techniques fail; and ii) Does not need skilled manual guidance for writing testbenches, writing design-specific assertions, or debugging spurious counter-examples. Using open-source RISC-V cores, we demonstrate the following: 1. Quick (<5 minutes for an in-order scalar core and <2.5 hours for an out-of-order superscalar core) detection of 100% of hundreds of logic bug and hardware Trojan scenarios from commercial chips and research literature, and 97.9% of "extremal" bugs (randomly-generated bugs requiring ~100,000 activation instructions taken from random test programs). 2. Quick (~1 minute) detection of several previously unknown bugs in open-source RISC-V designs.

Full PDF

1 Abstract — Existing techniques to ensure functional correctness and hardware trust during pre-silicon verification face severe limitations. In this work, we systematically leverage two key ideas: 1) Symbolic Quick Error Detection (Symbolic QED), a recent bug detection and localization technique using Bounded Model Checking (BMC); and 2) Symbolic starting states, to present a method that: i) Effectively detects both “difficult” logic bugs and Hardware Trojans, even with long activation sequences where traditional BMC techniques fail; and ii) Does not need skilled manual guidance for writing test-benches, writing design-specific assertions, or debugging spurious counter-examples. Using open-source RISC-V cores, we demonstrate the following: 1. Quick ( ≤ ≤ Index Terms — Formal Verification, Symbolic Quick Error Detection, Hardware Trojans, Pre-silicon Verification, Bounded Model Checking. I. I NTRODUCTION RE - SILICON verification requires major effort in a typical hardware design flow [1]. In this paper, we consider pre-silicon verification of single processor cores, which are critical components of any System-on-Chip (

SoC ). Generally, pre-silicon verification mainly targets logic design errors ( logic bugs ). However, it is also crucial to detect Hardware Trojans (

HTs ) [2], which are unauthorized modification of a system that result in incorrect functionality and/or the exposure of sensitive data [3]. While previous research on HTs focused on attacks implemented during fabrication [4], there is growing concern about HTs being inserted in third-party Intellectual Property ( IP ) cores by malicious entities [5]. This makes HT detection during pre-silicon verification essential. Similar to logic bugs, HTs can affect functionality of a system. For example, an HT can cause an error that creates a change in the software-visible state of a system, defined by the state of software-visible registers and memory. The objective of HT detection is to detect these changes, which encompass many catastrophic attacks on processor cores [2]. Symbolic quick error detection ( Symbolic QED or SQED ) [6] is a new pre-silicon verification technique based on QED tests [7]. It uses bounded model checking (

BMC ) [8] for formal analysis of the design. QED tests generate short sequences of instructions that trigger logic bugs in a design. As such, SQED is an automatic bug detection and localization technique that is

Manuscript received Jan. 10, 2020. K. Ganesan and S. S. Nuthakki are with Dept. of Electrical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). K. Ganesan and S. S. Nuthakki contributed equally. An early draft is cited in [9] as [Ganesan 18]. Sections III.A and IV.A of this manuscript were also re-used in an invited paper at ICCAD 2019 [182]. extremely effective in practice. For example, SQED was recently applied to several industrial microcontroller cores used in commercial automotive products [9]. It was able to detect all recorded logic bugs in the designs, while enabling an 8-60X (depending on the design) reduction in verification effort compared to the standard industrial verification flow. Importantly, SQED does not target single-instruction bugs (i.e., bugs such that a single instruction on a specific set of inputs always produces an incorrect result). There are many other techniques that are highly effective at detecting such bugs, from both research literature [10] and in industry [9]. SQED analyzes a design symbolically, but it requires a concrete starting state (e.g., a state of the digital system that is given explicitly as a bit-vector of 0s and 1s). That means, to find bugs or HTs that require long activation sequences (i.e., many instructions are required to activate such bugs), Symbolic QED must rely on very deep BMC runs (i.e., runs that unroll the system far enough to include all the activation instructions). This can be very difficult for practical designs. In a related study [6], it was shown that BMC could unroll a large, multicore SoC up to around 30 clock cycles, within 24 hours of verification time. The following example [11] shows that SQED, while highly effective for logic bugs, can be insufficient for HTs.

Motivating Example 1.

Consider the following HT that is difficult to find using existing HT detection techniques:

The HT changes opcodes of the next several decoded instructions if the processor has fetched a specific sequence of 256 instructions.

This HT could inject an instruction sequence to bypass physical memory protection and run a privileged instruction. Such privilege escalation attacks [2] can be catastrophic.

Because the HT requires a long sequence of instructions (and hence many clock cycles) for activation, SQED (like other BMC-based methods [12, 13]) fails to detect the HT unless the selected starting state for BMC quickly transitions to a state where the HT activates. Stumbling upon such a “close” state by starting at a concrete state (e.g., obtained from simulation or a power-on reset state) is highly unlikely to succeed since the HT can be designed with an arbitrary activation sequence that is not known a priori . To overcome this major challenge, we extend SQED so that it is now capable of starting from a symbolic (instead of concrete) starting state (i.e., we give the BMC tool the ability to choose an arbitrary starting state for each run). However, it is well-known that starting BMC from an unrestricted symbolic starting state risks generating spurious counterexamples ( false positives ). This occurs when the BMC tool incorrectly indicates that a bug or HT is present in a design, when there is actually no bug or HT. If the BMC tool selects a starting state that is not reachable from the set of all reset states of the system via a sequence of instructions, then a false positive might occur. For example, assume that each word in a memory Boosting the Bounds of Symbolic QED for Effective Pre-Silicon Verification of Processor Cores

Karthik Ganesan, Srinivasa Shashank Nuthakki P system is protected with a single even parity bit and assume that a BMC tool is asked to check the following property: for any sequence of reads and writes to the memory, the parity bits remain consistent with the data. If the starting state of the design is not constrained, then the BMC tool can initialize the memory to contain an all-zero word with a ‘1’ for the parity bit, issue an instruction that reads from this location, check the property, and report this false positive. Traditional methods rely on verification engineers to (manually) create constraints to rule out such false positives, which can be time-consuming for practical designs having many complex properties. This paper overcomes that challenge by: i) defining

QED constraints : sufficient constraints (see Section III for details and Appendix B for proofs) to ensure false positives do not occur when using Symbolic QED with symbolic starting states; and ii) introducing

QED recorders , which observe a small subset of internal signals within the processor to ensure the QED constraints are satisfied. QED recorders are used for pre-silicon verification only. They do not incur area overhead for the final design. Our work improves on a previous technique for SQED with symbolic initial states, called S QED [14]. That approach differs from ours in the types of processors that can be verified, the types of logic bugs and HTs that can be detected, and the way symbolic initial states are implemented. We explain our advantages in Section II.C. Experimental results using our new method demonstrate: 1.

We automatically, correctly, and quickly (~1 minute) detect several previously unknown (real) logic bugs in open-source out-of-order (OoO) superscalar [15], and in-order scalar [16] RISC-V cores. The bugs found in [15] cannot be detected by [6] or [14]. 2.

We automatically, correctly, and quickly (within 25 seconds for in-order, 18 minutes for OoO) detect 100% of (117 in-order, 120 OoO) simulated logic bugs, representing a wide variety of “difficult” logic bugs (Appendix A) from commercial designs. SQED with a concrete starting state detects only 33% (in-order) and 5% (OoO). 3.

We automatically, correctly, and quickly (within 5 minutes for an in-order core; 2 hours for an OoO superscalar core) detected 100% of (156 in-order, 195 OoO) simulated HTs, encompassing a wide variety of scenarios (Appendix A) from over 140 papers in the HT research literature. SQED with a concrete starting state detects 15% (in-order) and 9% (OoO). 4.

We automatically, correctly, and quickly (within 2.5 hours) detected 97.9% of an “extremal” bug family (randomly-generated pre-condition-based bugs which require ~100,000 activation instructions taken from random test programs) in an OoO superscalar core [15]. In contrast, SQED with a concrete starting state detected 0%, and [14] is not applicable. Important features of our new technique are: 1.

It is highly effective for detecting both logic bugs and HTs (despite long activation sequences) during pre-silicon verification of in-order and OoO superscalar cores, as demonstrated by our results. 2.

It does not require the verification engineer to manually craft design-specific assertions to detect logic bugs or HTs. 3.

No false positives occurred, as demonstrated by our results. 4.

It does not require a golden model or simulation data of the design-under-test for detection of logic bugs and/or HTs. 5.

Its effectiveness does not depend on the way HTs are designed, i.e., our method is HT-design agnostic. The rest of this paper is organized as follows. Section II provides background on earlier QED works. Section III describes Symbolic QED with symbolic starting states. Results are presented in Section IV, followed by a survey of related work in Section V and conclusions in Section VI. Appendix A provides a list of logic bug and HT types used in the experiments of Section IV. Appendix B provides formal proofs of the sufficiency of QED constraints (introduced in Section III.B). Appendix C provides details on how QED constraints are specified to the BMC tool. II. B ACKGROUND

In the following, we present the basics and terminology of QED [7], SQED [6], and S QED [14]. A. QED and the EDDI-V Transformation

Quick error detection (

QED ) is a testing technique that takes existing system validation tests (i.e., sequences of instructions) and automatically transforms them into a set of new tests using various QED transformations [7]. Among the various transformations that can be applied, Error Detection using Duplicated Instructions for Validation (

EDDI-V ) is the focus of our work (illustrated in Fig. 1). It targets bugs inside processor cores by checking the results of original instructions against the results of duplicate instructions. First, the software-visible register and memory space are divided into two halves, one for the original instructions and one for the duplicated instructions. Next, corresponding registers and memory locations for the original and duplicated instructions are initialized to hold the same values. This is called a

QED-consistent system state. Then, for every load, store, arithmetic, logical, shift, or move instruction in the original test, EDDI-V creates a corresponding duplicate instruction that performs the same operation, but uses only registers and memory reserved for the duplicate ones. The duplicated instructions execute after the original instructions (in the same relative order), but may be interleaved. The EDDI-V transformation then inserts periodic check instructions that compare the results of the original instructions against those of the duplicated ones. A failing QED test occurs if after an equal number of original and duplicate instructions have committed, the system reaches a state that is not QED-consistent. The respective starting state and instruction sequence constitute a counterexample or QED-compatible bug trace.

Fig. 1. Example of EDDI-V transformation. a) A sequence of original and duplicate instructions; b-c) Alternate interleaving of originals and duplicates . R17, R18, and R19 are the duplicate registers of R1, R2, and R3 respectively. B. Symbolic QED

Symbolic QED [6] combines QED transformations with bounded model checking [8, 17] for pre-silicon verification of a design. SQED creates a BMC problem to check all possible

EDDI-V tests within a bounded number of clock cycles for a failing one. It searches for counterexamples to properties of the form

Ra == Ra′ . Here, Ra is an original register, and Ra′ is the (a) (b)

R17 = R17 + 5R18 = R18 – R17R19 = R17 * R18

R1 = R1 + 5R2 = R2 – R1R3 = R1 * R2

R17 = R17 + 5R18 = R18 – R17R19 = R17 * R18

R1 = R1 + 5R2 = R2 – R1R3 = R1 * R2

R17 = R17 + 5R18 = R18 – R17R19 = R17 * R18

R1 = R1 + 5R2 = R2 – R1R3 = R1 * R2 (c) corresponding duplicate register in an EDDI-V test. To ensure that all possible counterexamples are QED-compatible: 1. Original instructions must be valid instructions from the instruction set architecture ( ISA ) of the design; 2. The instruction sequence must be an EDDI-V test. A

QED module (a small hardware module that is only used during pre-silicon verification and does not incur area overhead for the final design) automatically transforms a sequence of original instructions into a QED-compatible sequence (e.g., as in Fig. 1). The QED module only requires that the input sequence is made up of valid instructions that read or write to only the original registers and memory (conditions that can be specified directly to the BMC tool). After execution, a signal is asserted (denoted below as

QED ()*+, ). All original and corresponding duplicate registers should contain the same values in a bug-free situation, i.e., the BMC tool checks that ↑ .QED ()*+, / ⇒ 1 Ra == Ra′ *∈{4,…, N } where 2N is the number of registers defined by the ISA. Here (for a ∈ {1, … , N} ), Ra and Ra′ correspond to original and duplicate registers. ↑ .QED ()*+, / is true on any clock edge where QED ()*+, transitions from low to high. The starting state for the BMC run must also be a QED-consistent state, in which the value stored in each original register or memory location matches the corresponding duplicate register or memory location. This is to prevent spurious counterexamples from being generated. One way to obtain such a state is to run an EDDI-V test in simulation and stop immediately after QED checks have compared all register and memory values. Symbolic QED can also detect HTs, if it finds an EDDI-V test for which the HT affects original registers and duplicate registers differently. For example, assume an HT is inserted (unknown to the designer) that activates when a 128-bit counter reaches its maximum value. Assume the HT changes an in-flight instruction to a NOP when it activates (cf., activation criteria A.2.a.2 (𝑋 = 128) of Appendix A and effect A.2.b.1 of Appendix A). If the counter is initialized to the value −1 , and the register file is initialized in a QED-consistent state, SQED can detect the HT using the EDDI-V test {ADDI R1, R1, 2; ADDI R17, R17, 2; CHECK R1==R17}. However, as the existence of the counter is unknown, it is impossible to pick the proper concrete starting state a priori . Instead, our approach to SQED with symbolic starting states (detailed in Section III) automatically detects this HT by starting the design at a state where { Counter = 2 − 1;

R1=0; R17=0} and running the above EDDI-V test. C. S QED S QED [14] is another technique for pre-silicon verification of processor cores which extends SQED by incorporating symbolic initial states. Like the approach in this paper, S QED focuses only on EDDI-V tests. S QED instantiates two copies of the CPU-under-test, called CPU-1 and CPU-2. An arbitrary bijective mapping is then determined between the general-purpose registers and memory locations of the two CPU instances. A new notion of “QED-consistency” (which we refer to as S QED-consistency ) is achieved by any state where all values in the general- and special-purpose registers and memory locations of CPU-1 match the values in the mapped registers and memory locations of CPU-2. At the start of verification, both CPUs are initialized to a S QED-consistent state . CPU-1 fetches an instruction called the instruction under verification (IUV) while CPU-2 fetches the corresponding S QED duplicate instruction (i.e., the same instruction but with operand specifiers replaced by corresponding register or memory locations for CPU-2). In the following clock cycles, CPU-1 is constrained to fetch NOPs until the IUV commits, while CPU-2 can fetch arbitrary valid instructions (treated as symbolic instructions by the BMC tool). S QED then attempts to prove that directly after the IUV and its S QED duplicate commit, the CPUs remain S QED-consistent. Thus, S QED is can prove that the model of the processor design is free of bugs of a specific type. However, S QED [14] is unable to detect a (large) class of logic bugs and HTs that SQED [6] and the new method in this paper are able to detect:

Motivating Example 2.

When all registers in the general-purpose register file contain the same value, a logic bug or HT attack is triggered.

These bugs and HTs escape S QED, because they affect the original and duplicate CPU equivalently. In contrast, our new approach detects the bugs and HTs by creating scenarios where mismatches between original and duplicate instructions occur. To count how many distinct logic bugs and HTs could exist from the above class, the benchmark core [18] used in [14] has 16 distinct 32-bit general-purpose registers. There are at least K@ distinct logic bug activation types (Appendix A; Table A.1.a.3 with R =16) and at least K@ distinct HT activation scenarios (Appendix A; Table A.2.a.3 with M =16) which fall under Motivating Example 2 that are considered in our experiments in Section IV. Multiplying this by at least 3 distinct logic bug effects (Appendix A; Table A.1.b.1-3) and at least 4 distinct HT effects (Appendix A; Table A.2.b.1-4), there are at least 12 billion distinct logic bugs and at least 17 billion distinct HTs that would not be caught by S QED, but will be caught by our new technique. Another class of bugs and HTs also escape S QED, based on the design of the special-purpose register file in the CPUs.

Motivating Example 3.

When the registers in the special-purpose register file reach a particular concrete state, a logic bug or HT attack is triggered. S QED also requires the standard one-to-one mapping between the special-purpose registers of CPU-1 and CPU-2, to verify sequences containing special instructions that read from and/or write to those registers. Because the initial S QED-consistent state requires that the special-purpose registers start with the same values, the bugs and HTs of Motivating Example 3 are also triggered in both copies of the design. This category can include tens of billions of counter-based Trojans (cf., Table A.2.a.2), using the state of a special-purpose counter register (e.g., in CSR cycle in RISC-V [180] and Tick Timer in OpenRISC [181]) to trigger HT attacks. Our approach also differs from S QED in that it is especially suited for a broader class of processor designs, including OoO superscalar processors. This capability is enabled by the QED constraints we define (Section III.B), together with QED recorders (Section III.C) and a new QED module (Section III.C). In contrast to our approach, S QED is applicable to processors with Out-of-Order writeback. This is possible due to additional constraints that restrict the state of the instruction pipeline. Such constraints can also be integrated in our framework. Further, S QED does not require a QED module or QED recorders. However, it requires duplication of the CPU in the model of the design-under-test (only during pre-silicon verification), whereas our approach requires only a single CPU. III. E XTENDING SQED WITH SYMBOLIC STARTING STATES

We now present our new extension of SQED [6] with symbolic starting states. In Section III.A, we describe the design of a new, improved QED module that is integrated in the model of the design-under-test during pre-silicon verification. These improvements enable the detection of real logic bugs that [6] fails to detect (see Section IV.A). Section III.B introduces a set of QED-constraints on the symbolic starting state that allow us to avoid false positives. The sufficiency of these constraints is proven in Appendix B. To implement the QED-constraints, we introduce QED-recorders in Section III.C. These are additional hardware modules (used only during pre-silicon verification) that record a small subset of internal logic values of the processor core to ensure that the QED-constraints are satisfied when a QED test begins. Fig. 2 contrasts Symbolic QED without/with symbolic starting states.

Fig. 2. Symbolic QED without/with symbolic starting states. a) SQED inputs and steps with concrete starting state; b) SQED inputs and steps with symbolic starting state.

A. New QED Module for Single Processor Cores

Pseudocode for the new QED module is given in Fig. 3a. Inputs are: 1) enable : disables the QED module if false; 2) next_instruction : next instruction to be executed; 3) fetch_next : true when the core is ready to receive an instruction, i.e., the fetch stage is not stalled; 4) original : tells the core to execute an original (if true) or duplicate (if false) instruction. Outputs are: 1) instruction_valid : indicates whether the output instruction is valid; and 2) instruction_out : instruction to be executed. The QED module has internal variables: 1) queue : a queue data structure that stores previous original instructions that have not yet been executed in the duplicate subsequence; 2) head_instruction : the previous head of the queue; 3) insert_valid : true when an instruction is loaded into the queue; 4) delete_valid : true when the QED module can execute a duplicate instruction; 5) duplicate_instruction : next instruction in duplicate subsequence to execute (when original is false). QED checks occur when the qed_ready signal of the QED module is true. Pseudocode for determining this signal is given in Fig. 3b. To avoid trivial false positives, QED checks occur when an equal number of commits (writes) have been made to original registers and duplicate registers. This is accomplished by keeping track of the number of original and duplicate commits to the register set, as shown in Fig. 3b. For simplicity, in Fig. 3b, we assume that at most one instruction commits per cycle. For superscalar processors that can commit multiple instructions in the same cycle, we track all corresponding pairs of write_valid (tells whether the input data is valid) and write_address (the address for the data to be written) signals, keep a separate is_original signal (identifies if a write address corresponds to an original or duplicate location) for each instruction, and allow the original and duplicate counters to be incremented multiple times if needed. Fig. 3. Pseudocode for a) New QED module; b) QED-ready enable logic.

The old QED module of [6] requires that all original instructions complete, a waiting period occurs for the pipeline to be flushed, and duplicate instructions execute, before the qed_ready signal is asserted. In constrast, this new QED module allows arbitrary interleaving of the original and duplicate instruction subsequences, without requiring a waiting period. This additional timing diversity is made possible by giving the BMC tool control over the original input of Fig. 3a. The QED-ready logic (Fig. 3b) can be further enhanced as follows: 1.The current QED-ready logic is only applicable to single processor cores, since a multi-core system would require considering the original and duplicate commits across all cores. This can be challenging in situations where multiple cores operate with a shared address space. For simplicity, we do not consider this situation in this paper. 2.For some processors, e.g., superscalar processors with explicit register renaming (MIPS 10000 [19] and ARM’s Cortex-A15 [20]), the designation of original or duplicate instruction cannot be made solely on physical address (unlike in Fig. 3b). This issue can be corrected by including the

Design PhaseFormal Analysis

Core Design QED Module

Connect Modules For Verification

Run SQEDQED TestFailureBug Found

Concrete StartState a) Connect Modules For Verification

SQEDw/ Symbolic StartQED TestFailure

Bug

Found QED Constraints b) Core Design QED Module QED Recorders

INPUT: enable, next_instruction, fetch_next, original

OUTPUT: instruction_out, instruction_valid // initialization queue ¬ ; head_instruction ¬ ; // end initialization insert_valid ¬ fetch_next & original & ~queue.is_full(); delete_valid ¬ fetch_next & ~original & ~queue.is_empty(); instruction_valid ¬ insert_valid | delete_valid; if insert_valid then queue.push(next_instruction); // store next instruction in queue else if delete_valid then head_inst ¬ queue.pop(); // remove head instruction end if duplicate_instruction ¬ create_duplicated_version(head_inst); instruction_out ¬ (enable & ~original) ? duplicate_instruction : next_instruction; a) INPUT: write_valid, write_address

OUTPUT: qed_ready // initialization qed_ready ¬ false ; count_original ¬ ; count_duplicate ¬ ; // end initialization is_original ( is_write_to_original_space(write_address); if write_valid then if is_original then count_original++; // increment num. orig. insts. committed else count_duplicate++; // increment num. dup. insts. committed end if end if qed_ready ¬ (count_original == count_duplicate) ? true : false ; b) current state of the register mapping table as an input to the function is_write_to_original_space. Each time a QED check happens, the same mapping table must be used to map logical to physical addresses before comparing original and duplicate values. The RISC-V cores used in our experimental evaluation (see Section IV), however, do not have this issue. B. QED Constraints

We first define some terminology used in the constraint definitions: i) Symbolic In-Flight (SIF) “instructions”: symbols (i.e., state bits), part of the symbolic starting state (which will be assigned 0s and 1s by the BMC tool), corresponding to (microarchitectural) flip-flops within the pipeline that hold instructions during normal operation of the core ; ii) T C : the point in time when all SIF instructions have committed (i.e., written to the architectural state). This is determined by the BMC tool. iii) Symbolic QED instructions : symbols which represent the instructions that form the bug trace (which is part of the counterexample, along with the starting state that BMC assigns) generated by the BMC tool after T C ; and iv) Symbolic QED operand data : symbols representing the operand data of dispatched Symbolic QED instructions (dispatched before T C ). Fig. 4 illustrates these definitions for a 3-stage in-order pipeline. When the formal analysis begins, there are up to 3 SIF instructions in the pipeline, and all commit by time T C . The first Symbolic QED instruction (R1=R1+5 in Cycle 1 of Fig. 4) is fetched into the pipeline, and its Symbolic QED operand data is available after the Dispatch stage. Now, the QED constraints are stated as follows (Appendix C further details how each constraint is enforced): Constraint C-1.

At T C , all SIF instructions have committed (i.e., no SIF instruction can write to the architectural state after T M ), while all Symbolic QED instructions commit after T C . Constraint C-2.

At T C , the architectural state (program-visible registers and memory) is QED-consistent (Section II.B), and nothing but Symbolic QED instructions can write to architectural state after T C (e.g., test modes such as scan that bypass instructions to write to architectural state are disabled). Constraint C-3.

All the operand data for each Symbolic QED instruction 𝐼 , must satisfy one of the following properties: i) if operand data is available (i.e., 𝐼 has already read data for this operand) at T C then it matches the corresponding register or memory location (i.e., source operand location) data at T C . ii) if operand data is not available at T C , 𝐼 is waiting for the result of an earlier SQED instruction for this operand data. Fig. 4. Timing diagram for a three-stage in-order pipeline satisfying all QED constraints. SIF instructions commit by time T C , before all SQED instructions. The QED constraints form a sufficient condition to ensure no false positives, given that bug-free designs satisfy two assumptions after T C : Assumption-1 . If a Symbolic QED instruction is executed twice on the same data, it results in the same value being stored to architectural state, e.g., Rx=1+2; and Ry=1+2 always result in the same value stored to both registers Rx, and Ry. Note: there is no assumption that the stored value is ‘3’.

Assumption-2 . If a Symbolic QED instruction has a read-after-write dependency with earlier instructions, it uses the most recent value of the data in its computation. For example, in the program {R1=5; R2=R1+2; R3=R2-2}, if the first instruction stored ‘5’ to architectural-state, the second instruction will use value ‘5’ for R1. We can now state the main theorem of the paper. Formal definitions and full proofs are deferred to Appendix B.

Theorem 1.

Let Constraints C-1, C-2, and C-3 be satisfied by a starting state of a processor core. Let

Assumptions-1 and -2 hold after T C for any bug-free design of the core. If any EDDI-V test fails, the failure must be caused by a bug in the design. Proof : See Appendix B. ∎ P ROOF O UTLINE : We first define notation for a sequence of Symbolic QED instructions in a QED-compatible bug trace. Next, we isolate the first pair of Symbolic QED instructions which cause a failed EDDI-V test. We decompose the execution of these two instructions into a union of six mutually disjoint cases. For each case, we give a proof by contradiction (of one or more

Assumptions ) that the design must contain a bug ∎ We also observed empirically (see Section IV), that at least one assumption was violated in each BMC bug trace.

C. Symbolic QED Recorders

QED recorders copy a small number of internal signals in a design (to track T C and Symbolic QED operands) so that we can specify the QED constraints to the BMC tool. For ease of understanding, we take an in-order core with single instruction fetch and 5-stage pipeline as a running example in Section III.C, but we explain how the technique is generalized to other cores. In Section IV, we present results for both in-order (scalar) and OoO (superscalar) cores. Recorder for T C . As T C depends on the starting state chosen by the BMC tool, it cannot be statically determined before the formal analysis begins. A recorder is used to give this information to the BMC tool dynamically. For an in-order core, T C can be determined by simply tracking the progress of the first Symbolic QED instruction (the first symbolic instruction the BMC tool creates as part of the bug trace) until it reaches the commit stage (write-back stage) of the pipeline. At this time, all SIF instructions must have committed, as the pipeline is occupied by Symbolic QED instructions. Specifics of the T C recorder for a 5-stage, single-fetch, in-order pipeline is given in Fig. 5. Inputs are ready signals for all stages that precede the commit stage (e.g., fetch_ready is true when the fetch stage is ready to receive an instruction). The output SIF_complete is true when the first Symbolic QED instruction goes through all pipeline stages and reaches the commit stage. The output mode keeps track of progress made so far by the Symbolic QED instruction (we later make use of

Fetch

Time

SIF_Inst_1

Initialization Dispatch/EX Commit

Cycle 1

Cycle 2Pipeline stages

Cycle 3 ( ! " ) SIF complete,QED-consistent Register StateCycle 4

SIF_Inst_2

R1 = 3, R17 = 2R1 = 2, R17 = 2

R1=R1+5

SIF_Inst_3 SIF_Inst_2SIF_Inst_3

R1 = 2, R17 = 2

SIF_Inst_3

R1=2+5 R1=7

R1 = 2, R17 = 2

R17=2+5R1=R1-2

R1 = 7, R17 = 2

R1=7-2 … R17=R17+5 R1=5R17=R17-2 this output in the Symbolic QED operand recorder). This T C recorder for a 5-stage pipeline can be easily modified to support in-order pipelines with a different number of stages. For an OoO core, the T C recorder is even simpler. We mark the entry allocated in the reorder buffer ( ROB ) for the first Symbolic QED instruction. After this,

SIF_complete is assigned true when the ROB head pointer reaches the marked instruction. For cores with no ROB, but OoO commit (e.g., [18]), an additional constraint is required (see Section II.C).

Symbolic QED operand recorder.

Like T C , the Symbolic QED operands also depend on the starting state. The Symbolic QED operand recorder stores information for both register and memory operands. Specifics of the Symbolic QED operand recorder for a 5-stage, single-fetch, in-order pipeline is given in Fig. 6. Inputs are: 1) *_addr , which gives register/memory address of the corresponding operand; 2) *_data , which gives operand data; 3) *_valid , which is true when *_addr is valid and *_data is valid; 4) mode , which gives the state of the T C recorder (Fig. 5). Output *_buffer stores all Symbolic QED operands and their values (buffer depth is determined by the maximum number of instructions in-flight at a given time). Fig. 5. Pseudocode for T C recorder. Fig. 6. Pseudocode for Symbolic QED operand recorder. The formal tool is free to choose any values for symbols (state bits) associated with SIF instructions, including those that do not constitute a valid instruction. Operands may come from either registers or memory locations. For register (memory) operands, the dispatch stage is the register read (memory read) stage.

We only store the information for Symbolic QED instruction operands in buffers, i.e., we do not store operand information of any SIF instruction. This is enforced by checking the T C recorder state, i.e., mode (we do not add entries to *_buffer until all SIF instructions pass through the dispatch stage). In Fig. 6, we assume that each instruction requires at most two register values and one memory value, but the idea is easily extended to more source operands.For an OoO core, Fig. 6 is extended to include Symbolic QED operands that are waiting on results of earlier Symbolic QED instructions. For each waiting operand, we also store the instruction tag (ROB entry number) of the instruction it is waiting for. This information is used to specify Constraint C-3 for an OoO core (see Appendix C). IV. R ESULTS

In this section, we demonstrate the effectiveness of our new technique on two open-source RISC-V processor cores: i) V-scale [16], an in-order core targeting embedded applications; and ii) RIDECORE [15], an OoO superscalar core (2-way pipeline, 64 maximum instructions in-flight, 2 ALUs, 1 multiplier, 1 load/store unit) for high performance applications. For BMC, we used the Questa Formal tool (version 10.5c) from Mentor Graphics on an AMD Opteron 6438 with 128 GB of RAM. For each core, we instrumented the new QED module (Section III.A), QED constraints (Section III.B), and QED recorders (Section III.C).

A. Previously Unknown Bugs

We first found three previously unknown logic bugs in the multiplier reservation station (RS-m) of RIDECORE (all three confirmed by RIDECORE designers [21], see Table 1). These bugs only activate when back-to-back multiply instructions execute on successive clock cycles. They were detected due to the new QED module of this paper (see Section III.A). This design improves upon [6] by allowing arbitrary interleaving of original and duplicate instruction subsequences in EDDI-V tests without requiring a waiting period between them. The QED module of [6] cannot detect these bugs, and S QED [14] is not applicable to RIDECORE. Follow-up work [185] is even less applicable, because the authors removed the multiplier altogether

Table 1 . New bugs in RIDECORE. Symbolic QED runtimes. This condition is required for OoO cores, where there is a possibility that the Symbolic QED operand may wait on a SIF instruction instead of a Symbolic QED instruction. MULH is a signed multiply instruction selecting the upper half of the multiplier result. MULHU is an unsigned multiply, selecting the upper half bits of the multiplier result.

Bug Activation Bug Effect Runtime (power-on reset start state) Runtime (symbolic start state)

All but one (buggy entry) RS-m entries occupied; MULH assigned to vacant entry. First source operand of MULH corrupted. 63 sec. 25 min. Same as above. Second source operand of MULH corrupted. 69 sec. 61 min. Same, but MULHU assigned. Result corrupted. 93 sec. 64 min. INPUT: fetch_ready, dispatch_ready, exec_ready, mem_ready

OUTPUT:

SIF_complete, mode // initialization mode ¬ S ; SIF_complete ¬ false ; // end initialization if (mode == S ) && (fetch_ready) then mode ¬ S ; SIF_complete ¬ false ; //inst passes fetch stage end if if (mode == S ) && (dispatch_ready) then mode ¬ S ; SIF_complete ¬ false ; //inst passes decode stage end if if (mode == S ) && (exec_ready) then mode ¬ S ; SIF_complete ¬ false ; //inst passes execute stage end if if (mode == S ) && (mem_ready) then mode ¬ S ; SIF_complete ¬ true ; //inst passes mem stage end if INPUT: src1_addr , src1_data, src1_valid, src2_addr, src2_data, src2_valid, mem_addr, mem_data, mem_valid, mode OUTPUT: src1_buffer, src2_buffer, mem_buffer // initialization src1_buffer ¬ empty_buffer; src2_buffer ¬ empty_buffer; mem_buffer ¬ empty_buffer; // end initialization if (mode != S ) && (mode != S ) && (src1_valid || src2_valid) then if (src1_valid) then src1_buffer.add_entry(src1_addr, src1_data); end if if (src2_valid) then src2_buffer.add_entry(src2_addr, src2_data); end if end if if ((mode == S ) || (mode == S )) && (mem_valid) then mem_buffer.add_entry(mem_addr, mem_data); end if We also found two bugs in V-scale (Table 2), by running Symbolic QED with the new QED module, starting at a concrete, power-on reset state in less than 40 seconds (also confirmed by designers). These bugs are due to errors in the V-scale implementation of the RISC-V privileged ISA [22], within specific Control Status Registers (CSRs). Importantly, V-scale does not implement shadows for CSRs. To circumvent this, the EDDI-V transformation (Section II.A) duplicates instructions using a scratchpad memory for each CSR.

Table 2 . Confirmed bugs in V-SCALE. Runtimes are for Symbolic QED with concrete, power-on reset starting state

Bug Activation Bug Effect Runtime ‘1’ is written to specific bit positions in the machine-interrupt CSR MIP. MTIMECMP register corrupted; Causes repeated interrupts. 2 sec. Any value with lower two bits ‘01’ or ‘10’ written to the machine-level CSR MSTATUS. Design enters unspecified privilege level; MEPC corrupted; 33 sec.

B. “Difficult” Logic Bugs and HT Scenarios

We simulated 120 (117) logic bug types using RIDECORE (V-scale). These are “longer” (up to 256 consecutive activation instructions) versions of “difficult” logic bugs (see Appendix A; Table A.1.a-b) that occurred in various commercial designs [6]. We also simulated 195 (156) difficult HT scenarios (see Appendix A; Table A.2.a-c) which encompasses over 140 papers in research literature (see Section V.C) using RIDECORE (V-scale). Results are in Table 3.

Table 3. “Long” logic bugs and HTs. We report [min, avg., max]. “Long” Bugs HTs V - sca l e Total count injected 117 156

Symbolic QED with symbolic starting state

Coverage 100% 100% Bug trace length (instructions) [2, 2, 3] [2, 2, 3] Bug trace length (clock cycles) [5, 5, 6] [5, 5, 6] BMC runtime (seconds) [2, 4, 25] [2, 11, 313]

Symbolic QED with concrete starting state

Coverage 33% 15.3% R I D E C O R E Total count injected 120 195

Symbolic QED with symbolic starting state

Coverage 100% 100% Bug trace length (instructions) [4, 4, 4] [4, 4, 4] Bug trace length (clock cycles) [8, 8, 8] [8, 8, 8] BMC runtime (minutes) [7, 13, 18] [7, 20, 121]

Symbolic QED with concrete starting state

Coverage 5% 8.7%

Observation 1:

SQED with symbolic starting states correctly and automatically found all “long” logic bugs, in less than 30 mins, with no false positives. It found bugs that traditional BMC methods fail to detect (including SQED).

SQED with concrete starting state detected only 5% of these bugs in RIDECORE and 33% of these bugs in V-scale.

Observation 2:

SQED with symbolic starting states correctly and automatically found all injected HTs (including those designed to evade state-of-the-art HT detection techniques; see Section V and Appendix A), in less than 2.5 hours, without This program comes packaged with RIDECORE by the designers as a part of a testbench. It is used only for “extremal” bug creation – not for verification or bug detection. requiring design-specific assertions or debug of false positives. SQED with a concrete starting state detected only 9% of these HTs in RIDECORE and 15% of these HTs in V-scale.

C. “Extremal” Bugs

To further demonstrate the robustness of our presented technique, we inject “extremal” bugs (only triggered when the design reaches a very specific set of states) into RIDECORE. We focused on RIDECORE for this experiment since it is OoO, superscalar, and more complex than V-scale. Also, S QED [14] is not applicable to RIDECORE. Our extremal bug injection methodology is as follows: i) Run Matrix Multiply (1M cycles) on the design in simulation, and stop the simulation at a random clock cycle; ii) Run a uniform random sequence of 100 ALU or Load/Store instructions; iii) Select a uniformly random subset of flip-flips from the set of all flip-flops in the design and record their logic values; and iv) Generate a bug (effect A.1.b.3 of Appendix A), injected into the design. This bug activates when the design reaches a state where all the selected flip-flops (step iii) have a specific set of values recorded (step iii). We present our results in Table 4. For generating such extremal bugs, we randomly chose 180 time points (step i), ranging from 26,026 to 988,159 clock cycles elapsed from program start. For each time point, we ran a random 100-instruction sequence (step ii), and then randomly selected 10 different subsets of 128 flip-flops (step iii), resulting in 1,800 total extremal bug count. Using the Questa Formal tool (version 10.5c) from Mentor Graphics on 6 (in parallel) AMD Opteron 6438 machines with 128 GB of RAM, we were able to run roughly 60 experiments to completion each day. We stopped at 1,800 experiments, after roughly 1 month of runtime. Whereas Symbolic QED with concrete starting state detected 0% of these 1,800 “extremal” bugs, Symbolic QED with symbolic starting state was able to detect 1,763 of these 1,800 bugs. For the remaining cases, the BMC tool timed out after 24 hours. A closer inspection reveals that the BMC tool was not able to unroll the design beyond 7 clock cycles (8 clock cycles are needed to observe these bugs). In future work, we plan to investigate ways to improve BMC tools to address such issues (following approaches such as [23, 24]). Table 4. “Extremal” logic bugs in RIDECORE. We report [min, avg., max].

Observation 3:

Our new Symbolic QED with symbolic starting states correctly and automatically found 97.9% of the “extremal” logic bugs and generated a bug trace in less than 2.5 hours. In contrast, Symbolic QED with concrete starting state detected 0% of the “extremal” bugs. V. R ELATED WORK

In this section, we compare and contrast existing pre-silicon

Total count injected 1,800

Symbolic QED with symbolic starting state

Coverage 97.9% Bug trace length (instructions) [4, 4, 4] Bug trace length (clock cycles) [8, 8, 8] BMC runtime (minutes) [8, 33, 149]

Symbolic QED with concrete starting state

Coverage 0% verification techniques for logic bug detection (Section V.A) and HT detection (Section V.B) with the method of this paper. In Section V.C, we provide a survey of HT attacks implemented in research literature. We show that each attack fits in one of the categories (Appendix A; Tables A.2.a-b) used in our HT experiments (see Section IV.B) A. Existing Pre-silicon Verification Methods for Logic Bugs

Existing formal verification techniques employing BMC [6, 10] have issues in detecting logic bugs that require a long activation sequence. Other works for processor cores rely on theorem proving [25], or try to learn invariants of the design [26] to be used as constraints, but these techniques tend to be ad-hoc and require a high level of manual effort. In seminal work [27] and extensions [28], models of processors were verified based on abstractions by uninterpreted functions with equality. That approach in general requires to provide invariants to avoid false positives. E-QED [29] is a BMC-based technique for electrical bug localization in post-silicon validation. Apart from that, it is substantially different from our technique, e.g., as it does not rely on the duplication of instructions. False positives are a major challenge for traditional BMC. However, the same QED constraints (Section III.B) used by our approach may not prevent false positives for general property checking using BMC. The following example illustrates this point. Let a processor core start at a state where the Exception Program Counter (EPC) (i.e., the register storing the return address for an exception) is misaligned (i.e., not aligned with any word in the instruction cache), the current PC is within an exception handling routine, and there are only NOP instructions in the pipeline. This is an unreachable state for processors with strict alignment rules (e.g., MIPS [19]). It is reasonable to check the property that the EPC is aligned, since returning to a misaligned address can cause programs to crash. Even at time T C , when the NOP sequence is finished, this EPC will still be misaligned, causing a false positive. With QED constraints (Section III.B), we do not get such a false positive because the exception handling routine will be filled by valid QED tests. Hence, any time we assert a QED check, it will not fail unless there is a bug in the design. B. Existing Pre-silicon Verification Methods for HTs

Existing HT detection techniques that can be applied in pre-silicon verification broadly belong to two categories: i) design analysis methods; and ii) formal methods [30]. One class of design analysis techniques use the observation that signals associated with HTs may be mostly unused or rare. [5, 31-33] use simulation data along with rareness metrics (e.g., code

Table 5.

HT Attack(s) implemented in other works

Trigger(s) from Table A.2.a Parameters IP Block(s) Infected References

TrustHub HT Benchmarks [37] a.1-a.4 TrustHub Benchmark Circuits [5,31,34-36,65,69-71, 91,119,148,183,184] AES, RS232, Wishbone, BasicRSA [64] AES, RSA [44,46] DES Encryption Core [45,76] 8051 Microcontroller [41] RS232, Wishbone, PIC 8-bit, ISCAS-89 [13] AES, RSA, RS232 [116] OpenRISC 1200, LEON3, RS232, ISCAS [111] RS232 [5,100,119] AES-128 [59,60,126,138] Spartan FPGA [145] OpenSPARC T1 [120] 4 Processors, FIR, ADPCM, CRC, MMult [136] Binary Sequence Detector, RS232, ISCAS-89 [90] AES, DES, SHA [125] DeTrust HT Benchmarks [11] a.1-a.4 DeTrust Benchmark Circuits [35,71] RS232, AES, Wishbone, BasicRSA [64] a.1 (N=1, M not given) UART Module [79] (N=2, M =2) ISCAS benchmarks [65] (N=2, M =11) AXI Interconnect Bus [127] (N=3, M =4) OpenRISC 1200, Wishbone, ISCAS-89 [11] ISCAS-89 benchmark circuits [54] (N=3, M =16) 8051 Microcontroller [106] (N=3, M =32) AES-128 [51] HLS-FPGA [139] (N=4, M =2) ISCAS-85 and ISCAS-89 benchmarks [63] (N=4, M =8) RS232 [61] (N=4, M =16) 8051 Microcontroller [12] (N=4, M =128) AES-128 [12,46,140] (N=5, M =3) TRNG (15-bit) [123] (N=5, M =32) Leon2 Processor [85] (N=2,4,8, M =2,4,8) AXI4, APB [67] (N=8, M =64) Neural Network Accelerator [151] a.1 (N=12, M =11) 8051 Microcontroller [53] (N=14, M =42) UART Interface [53] (N=16, M =128) Sequence Detector, RS232, ISCAS-89 [90] (N=30, M =1) Chameleon Cryptography Core [87] (N=32, M =4096) AES-256 (on FPGA in USB Drive) [128] (N=4,16,64, M ≤ =4) OpenRISC 1200 [12] (N<210) ISCAS-89, UART, AES [150] (N=4000, M =8) NVM with 8-bit address [130] (N=1,128,1024,8192, M =4,8,32,64,128) AES-128 [83] Not specified Leon3 [2,32] ITC-99 benchmarks (slightly modified) [39] 8051 Microcontroller, RC5 [40] AES 128, 192, 256 [68] 32-bit DLX processor [86] Processor data memory [92] TEA encryption on FPGA [97] ISCAS-85 and ISCAS-89 benchmark circuits [77,98] OpenSPARC T2 [103-104] 4-bit adder circuit [105] SystemC Accelerators [102,132-133] a.2 (X =3) 8-bit ALU [113] (X =5) 16-bit SEC-DED decoder [113] (X =8) 8-LED Circuit on Spartan-3 FPGA [84] (X =4,8,12) AES-128 [134] (X =19) RS232 [31] (X =32) RS232 [61] (X =128) AES-128 [12,46] Not specified Leon3 [2] ITC-99 benchmarks (slightly modified) [39] XOR-LFSR circuit [71] ISCAS-85 benchmark circuits [77] 64-core SoC on Virtex-7 FPGA [82] 32-bit DLX processor [86] TEA encryption on FPGA [97] OpenSPARC T2 [103-104] 4-bit adder circuit [105] OpenSPARC T1 [120] Zynq FPGA [124] SystemC Accelerators [102,132] a.3 (M =1) RSA [58] Sense-Amplifier [72] Arithmetic, DSP cores (C programs) [94] IIR Filter Accelerator (C program) [95] SoC (32-bit DLX core, AES, FFT) [107] ISCAS-89 benchmarks [107] Turbo Decoder [115] (M =2) ISCAS-85 and ISCAS-89 benchmarks [62,108-110,118] DSP core on Virtex-7 FPGA [81] ITC-99 benchmark circuits [108-110] Synchronous FIFO (8-bit, 16-depth) [121] NVM bit-cell [137] (M =3) ISCAS-85 benchmark circuits [146] (M =4) 8-bit Adder, UART [43] ISCAS-85 and ISCAS-89 benchmarks [55,62,149] 4-bit ALU [52] AES [131] OpenRISC1200 [131] (M =6) Elliptic Curve Crypto Core Wi-Fi FEC (Viterbi) decoder [66] [144] (M =7) AES-128 [122] coverage, signal correlation). [34, 35] do not need simulation data, but still trade off false-positives (i.e., spurious detection of HTs) for false-negatives (i.e., failure to detect HTs) and vice- versa, depending on the thresholds set for their rareness metrics. Additionally, stealthy HTs have been designed [11] to bypass such analyses. In contrast to that, our technique does not require simulation data, detects stealthy HTs given in [11, 36-37], and does not produce any false positives. However, our technique is for processor cores, while the aforementioned analysis techniques are applicable to general designs. Formal methods for finding HTs generally either use BMC [12, 46], SAT-based equivalence checking [13, 38-39] or theorem proving [40-42]. These techniques face similar challenges to BMC-based techniques for logic bug detection, in addition to manual creation of properties. Complementary approaches to ours include techniques to detect HTs that leak sensitive data, but do not produce incorrect logic values [43-46], and HT prevention methods [47-49]. C. Literature Survey of Implemented HT Attacks

To confirm that our list of HTs (Appendix A; Table A.2.a-b) is representative, we surveyed over 140 different papers which categorize and/or construct example HTs for experimental evaluation. All HT constructions are shown in Table 5 (with parameters given, if noted explicitly in the paper). Many of the papers use a large subset of the TrustHub [37], DeTrust [11], or a.3 (M =8) 8051 Microcontroller [12] RS232 [61] ISCAS-85 benchmark circuits [135] (M =9) AES [31] (M =10) UART C-code accelerator [102] (M =16) ISCAS-85 benchmark circuits [31] 10G Ethernet MAC circuit [37] Virtex-7 FPGA LUT [80] Memory controller with 16-bit address [89] iCE40 FPGA [114] (M =32) Reg. File Copy Circuit (only discussed) [42] 16-bit Multiplier, 32-bit RSA (in ARM SoC) [73] XTEA encryption module [101] (M =40) NoC on FPGA [147] (M =64) RS232 [56] 10G Ethernet MAC circuit [96] DES, Cellular Automata PRNG [101] (M =72) NoC with ARM Core [142] (M =16-128) AES-128 [57] (M =128) AES-128 [12,46,52,102,112,140] (M <180) ISCAS-89, AES, UART [150] (M =67,72,152,200) OpenSPARC SRAMs. Custom SRAMs. [78] (M arbitrary) Leon3 [99] Not specified Leon3 [32] ITC-99 benchmarks (slightly modified) [39] AES-128 [48] FHT, PID Control, FPU, PRNG, R-S Dec. [50] ISCAS-85 Benchmarks [49,74,77] Alpha Encryption on Spartan FPGA [75] 32-bit DLX processor [86] MPEG and 5 DSP accelerators [93] TEA encryption on FPGA [97] OpenSPARC T2 [103-104] ReLU Hardware Block [117] SystemC Accelerators [102,132-133] a.4 (X =5) 8080, RISC-V, and MIPS CPUs. AES, RC5 [143] (X =7) ISCAS-85 benchmark circuits [31] (X =10) ISCAS-89 benchmark circuits [38] (X <58) ISCAS-89, AES, UART [150] (X =128) AES-128 [140] Not specified PIC-16 [31] ITC-99 benchmarks (slightly modified) [39] 64-core SoC on Virtex-7 FPGA [82] 32-bit DLX processor [86] TEA encryption on FPGA [97] 4-bit adder circuit [105] TrustHub Benchmark Circuits [129] ISCAS-89 benchmark on Virtex-II FPGA [141] [152] benchmarks. All of these benchmarks are covered by HT activation criteria from Table A.2.a. Additional papers [153-175] not included in Table 5 are surveys or papers that describe attacks in words. We confirmed that there are no HT attack types in [153-175] that we do not encompass. VI. C ONCLUSION

In this paper, we extended Symbolic QED to include symbolic starting states. As a result, we overcome limitations of existing pre-silicon verification techniques for detecting logic bugs and HTs that require long activation sequences. The unique combination of Symbolic QED and QED constraints enable us to achieve this objective. Our results on multiple open-source RISC-V processor cores demonstrate the effectiveness and practicality of our approach: i) detection of previously unknown logic bugs within minutes; ii) detection of 100% of hundreds of long logic bugs and HTs (SQED with a concrete starting state detects, at best, 33%); iii) detection of 97.9% of “extremal” logic bugs (SQED with a concrete starting state detects 0%). Future research directions include: i) extending our approach to detect bugs and HTs in other SoC components beyond processor cores, such as uncore components and accelerators; ii) handling other QED transformations beyond EDDI-V, e.g., CFTSS-V and CFCSS-V [6]; iii) automated methods for inserting QED recorders and generating QED constraints on the symbolic starting state; iv) theoretical comparison of the bug detection capabilities of [6], [14], and the method of this paper; v) extension to bug and HT detection [176] for transistor-level analog circuits [177]. A

PPENDIX A: LOGIC BUG AND HARDWARE TROJAN TYPES

In the following tables, we give the different logic bug (harder versions of “difficult” bugs that occurred in various commercial designs [6]) and HT scenarios (from research literature) used in Table 3 of Section IV. Each “long” logic bug is modeled with two parts: i) activation criteria of the bug (Table A.1.a), i.e., the conditions which need to be satisfied for the bug to activate; and ii) effect of the bug once it is activated (Table A.1.b). For our experiments, we considered a whole range of values for the parameters in Table A.1, as follows, N = Y ={2, 4, 8, 16, 32, 64, 128, 256}, R = X ={2, 4, 6, … , 30}. This results in a total of 117 logic bugs in V-scale (A.1.a.5 is not possible), and 120 logic bugs in RIDECORE. Table A.2.a gives HT activation types used in Table 3 of Section IV. Table A.2.b gives various effects an HT can have on the executing instructions [2]. Table A.2.c presents three HT implementation techniques used to inject HTs in designs [12]. We create stealthy HTs that are known to evade common detection techniques (e.g., HT designs from [36] evade detection techniques based on UCI [32] and coverage metrics [5, 32]). A HT scenario is formed by using one activation criteria (Table A.2.a) with one bug effect (Table A.2.b), along with an appropriate design strategy (Table A.2.c). We used a wide range of HT scenario parameters, given in Table A.2: N ={2, 4, 8, … , 256}, M =32, X = X ={128, 256}, M =64, resulting in 156 HT scenarios in V-scale (effect A.2.b.5 is not possible) and 195 HT scenarios in RIDECORE. These values make HTs harder to activate than benchmark HTs in [37]. Logic bugs and HTs were injected by introducing a small state machine into the design that checks for the activation criteria, and flips bits at targeted wires to achieve the bug or attack effect. Table A.1.a.

Activation criteria for “long” logic bugs.

Processor Core 1. Data forwarding between pipeline stages. 2. Two specific instructions within X cycles. 3. R registers must all contain value V . 4. A specific sequence of N instructions must execute within Y cycles. 5. A specific cache state. Table A.1.b.

Bug effect from [6]. Processor Core 1. Next instruction corrupted to NOP. 2. Next instruction opcode incorrectly decoded. 3. Next instruction register read corrupted.

Table A.2.a.

Activation criteria for HTs. Processor Core 1. Specific length N sequence on M internal wires. 2. X bit counter reaching final value. 3. Comparator on M internal wires becomes true. 4. X bit rare event counter reaches a specific value. Table A.2.b.

HT effects. Processor Core 1. An in-flight instruction changed to NOP. 2. Opcode of an in-flight instruction changed. 3. Next register read corrupted. 4. Next result of an execution unit changed. 5.Corrupts ROB. Prematurely commits next inst.

Table A.2.c.

HT design techniques. Method HT stealthy against following techniques [37] Traditional pre-silicon verification. [36] UCI [32]; coverage metrics [5, 32]. [11] [5, 32, 33, 34]. A PPENDIX B: PROOFS

In this Appendix, we formalize our assumptions on how any bug-free design should operate. We relate these assumptions to Section III.B, and prove Theorem 1. We first provide preliminary definitions [178]. We will use the term alphabet to refer to a nonempty set of symbols. We will often be dealing with operations on words (sequences of symbols from countable alphabets). We will denote the empty word as 𝜀 , and the empty set as ∅ . For finite vectors of equal length, 𝐱 = (x , … , x U ) and 𝐲 = (y , … , y U ) , we write Δ(𝐱, 𝐲) for the subset of indices they differ on. For example,

Δ((9,0), (7,0)) = {1}, and

Δ((1,0), (1,0)) = ∅ . The next definition provides the model of computation used in our statements. Definition 1 (Transition System) . A tuple

𝕋 = 〈𝒮, 𝒮 _ , 𝕀, 𝒯, ℱ〉 is called a transition system if • 𝒮 is a countable alphabet of states . • 𝒮 _ ⊆ 𝒮 is the nonempty subset of initial states. • 𝕀 is a nonempty, finite set of actions. • 𝒯 ⊆ 𝒮 × 𝕀 × 𝒮 is the transition relation . • ℱ ⊆ 𝒮 is the nonempty subset of accept states. Each transition system has an associated specification function

𝑆𝑝𝑒𝑐: 𝒮 × 𝕀 → 𝒮 , where 𝑠 = 𝑆𝑝𝑒𝑐(𝑠 _ , 𝐼) ⟺ 〈𝑠 _ , 𝐼, 𝑠 〉 ∈ 𝒯 . The state-space 𝒮 contains all states that can be represented by the transition system. 𝒮 _ is the set of starting states that the transition system can begin in for any execution. 𝕀 is a set of actions that can be applied to steer the system from one state to the next. 𝒯 is the transition relation, a countable set which represents the mapping that each action implements. ℱ is a set of accept states, in which executions can end. If 𝒮 is finite, and | 𝒮 _ |=1 (i.e., there is only one possible starting state), the system is called a finite-state machine [179]. The next definition describes finite sequences of actions, and the states they allow a system to traverse. Definition 2 (Path) . A path or trace of length K in a transition system 𝕋 = 〈𝒮, 𝒮 _ , 𝕀, 𝒯, ℱ〉 is a pair .𝑠 _ , {𝐼 n } no4p / , where i. 𝑠 _ ∈ 𝒮 _ is an initial state. ii. {𝐼 n } no4p is an action sequence, where 𝐼 n ∈ 𝕀 , for . iii. 〈𝑠 n , 𝐼 ns4 , 𝑠 ns4 〉 ∈ 𝒯 , for . A path may also be denoted explicitly by the sequence of states traversed as 𝑠 _ t u → 𝑠 v → ⋯ t xyu z⎯| 𝑠 p}4 t x → 𝑠 p . For any processor core, we assume there exists a transition system model (with finite state-space, but not restricted to a single starting state) that defines the behavior of the system. We call this transition system the

ISA (e.g., [180, 181]) of the core. The actions in an ISA are called instructions . This concept is formalized in the next definition.

Definition 3 (ISA) . A transition system

𝕋 = 〈𝒮, 𝒮 _ , 𝕀, 𝒯, ℱ〉 with finite state-space 𝒮 is called an ISA with

N ≥ 1 original registers and

Q ≥ 1 original memory locations if i.

𝒮 = .R , … , R @U , M , … , M @(cid:128) , 𝐙/ is the state space. ii. .R , … , R @U , M , … , M @(cid:128) / is called architectural state. iii. 𝐙 is the internal state. iv. R , … , R U are original registers . R Us4 , … , R @U are duplicate registers . Registers can take any value from a finite alphabet, ℛ . v. M , … , M (cid:128) are original memory and M (cid:128)s4 , … , M @(cid:128) are duplicate memory . Memories can take any value from a finite alphabet, ℳ . vi. Each instruction

I ∈ 𝕀 , has a set of source operands , and a set of destination operands . vii. R (cid:133) is a source operand of I ∈ 𝕀 iff. ∃ s _ , s ∈ 𝒮 , with corresponding architectural states a _ , a st., 𝑆𝑝𝑒𝑐(s _ , I) ≠ 𝑆𝑝𝑒𝑐(s , I) , and Δ(a _ , a ) = {k} . Likewise, M (cid:138) is a source operand of I ∈ 𝕀 iff. ∃ s _ , s ∈ 𝒮 , with corresponding architectural states a _ , a st., 𝑆𝑝𝑒𝑐(s _ , I) ≠ 𝑆𝑝𝑒𝑐(s , I) , and Δ(a _ , a ) = {2N + j} . viii. A register or memory location is a destination operand for

I ∈ 𝕀 if there exists 〈𝑠 _ , I, 𝑠 〉 ∈ 𝒯 such that 𝑠 _ and 𝑠 differ at the corresponding index . Instructions having only original registers and memory as both source and destination operands are called original instructions . Likewise, instructions using only duplicate registers and memory as operands are called duplicate instructions . An instruction is considered a

NOP if the set of all source and destination operands is ∅ . Note, there is no loss of generality in this hard-coding of the architectural state indices. We can always define two permutations, σ and µ on [1, … ,2N] and [1, … ,2Q] respectively, s.t. 𝒮 = .R (cid:145)(4) , … , R (cid:145)(@U) , M (cid:146)(4) , … , M (cid:146)(@(cid:128)) , 𝐙/ . Definition 3 is general enough to handle instructions with multiple destination operands. So, superscalar cores which commit multiple instructions per cycle are still covered. Now we prove Theorem 1 of Section III.B.

Proof of Theorem 1 : Let .𝑠 _ , {J n } no4p / be a bug trace obtained from SQED with symbolic starting states, denoted as: 𝑠 _ (cid:148) u → 𝑠 v → ⋯ (cid:148) (cid:149)(cid:150) z| 𝑠 (cid:151)(cid:152) (cid:153) u z| ⋯ (cid:154) (cid:155) z| 𝑠 p . Here, {J n } no4(cid:151)(cid:152) are the symbolic in-flight instructions and 𝑠 (cid:151)(cid:152) is a QED-consistent state (see Constraint C-2). Without loss of generality, assume the original, duplicate register and memory pairs for EDDI-V are chosen as ( R n , R nsU ) , for and ( M (cid:138) , M (cid:138)s(cid:128) ) , for . Let {O n } no4(cid:157) be the subsequence of original instructions and {D n } no4(cid:157) be the subsequence of duplicate instructions in the failing EDDI-V test. Indexing in the subsequences corresponds to the order in which instructions commit to architectural state. Without loss of generality, assume the failed EDDI-V test is due to mismatched values in two registers ( R n , R nsU ) , as opposed to memory. Let 𝑣𝑎𝑙(s, R n ) denote the value held by register R n in state s . Because 𝑠 (cid:151)(cid:152) is QED-consistent, 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , R n ) = 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , R nsU ) . Further, it must be true that ∃a ∈ {1, … , n} st. .𝑣𝑎𝑙(𝑠 (cid:151) ¢£ , R n ) ≠ 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ , R nsU )/ ∧ .⋀ 𝑣𝑎𝑙(𝑠 (cid:151) ¢§ , R n ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄§ , R nsU ) ¤∈{4,…,*}4} / , (1) where T (cid:153)* is the time where O * writes (commits) to destination operand R n and T (cid:154)* is the time where D * writes to destination operand R nsU . Explicitly, (O * , D * ) is the first pair of SQED instructions that force these two architectural state elements to be QED inconsistent. Equation (1) is true because there must exist some Symbolic QED instructions which write to the pair R n , R nsU such that the QED check fails on 𝑠 p , as Constraint C-1 ensures that all SIF instructions complete by T M . We can partially represent instructions O * and D * as below: 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ , R n ) = op(x , x @ , … , x “ ) 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ , R nsU ) = op( x , x @« , … , x “« ) . Here, op is a function performed on data stored in source operands x , … , x “ , and each instruction writes a computed result to its destination operand. Because O * , D * are a pair of EDDI-V instructions, they implement the same function on the source operands. When O * and D * execute, only one of two conditions are satisfied: (A) ∀j ∈ {1, … , m}, 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . (B) ∃j ∈ {1, … , m} st. 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) ≠ 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ). First, assume (A) holds. Then all source operand data of O * and D * match: 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ , R n ) = op(data , data @ , … , data “ ) 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ , R nsU ) = op(data , data @ , … , data “ ). However, we must have 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ , R n ) ≠ 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ , R nsU ) because an EDDI-V test failed. This implies that the same operation ( op ) performed on the same data results in two different values when executed twice, contradicting Assumption-1 of any bug-free design. Therefore, for case (A) , Theorem 1 holds.

Next, assume (B) holds instead of (A) . We have five mutually disjoint subcases for (B), depending on when the operands’ data are available for the instructions to compute on (for each subcase, we show that Theorem 1 holds): (B.1)

At T C , data for both operands x (cid:138) and x (cid:138)« is available. As Constraint C-3 (i) holds, 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) , and 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138)« ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . From Constraint C-2, 𝑠 (cid:151)(cid:152) is also QED-consistent. Therefore, by transitivity of equality, 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . This contradicts the assumption of (B) . Thus, (B.1) cannot arise when QED constraints hold. (B.2) At T C , data for only one operand (pick x (cid:138) without loss of generality) is available. If (B.2) holds, only one of the following two cases can arise: (B.2.1) There are no earlier (before D * ) SQED instructions writing to x (cid:138)« ; or (B.2.2) There is at least one earlier SQED instruction upon which D * has RAW dependency for source operand x (cid:138)« . Assume case (B.2.1) . From Constraints C-3 (i) and C-2, 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) and 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138)« ) . If Assumption-2 holds, 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) = 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138)« ) , which implies by transitivity of equality that 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) =𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . However, this contradicts the assumption of (B) . Thus, for case (B.2.1) , if an EDDI-V test fails, it is caused by a bug in the design. Now assume that (B.2.2) holds instead of (B.2.1) . Let D , be the last instruction that D * has RAW dependency on for source operand x (cid:138)« . From Assumption-2 , 𝑣𝑎𝑙(𝑠 (cid:151) ⁄ﬂ , x (cid:138)« ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . Let O , be the corresponding original instruction for D , . Note that O * also has an equivalent RAW dependency upon O , . Therefore, from Assumption-2 we again have 𝑣𝑎𝑙(𝑠 (cid:151) ¢ﬂ , x (cid:138) ) =𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) . We also have from equation (1), 𝑣𝑎𝑙(𝑠 (cid:151) ¢ﬂ , x (cid:138) ) =𝑣𝑎𝑙(𝑠 (cid:151) ⁄ﬂ , x (cid:138)« ). Hence, by transitivity of equality, 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) =𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . However, this contradicts the assumption of (B) , Thus, for case (B.2.2) , if an EDDI-V test fails, it is caused by a bug in the design. (B.3) At T M , data for both operands x (cid:138) and x (cid:138)« are not available. If case (B.3) holds, only one of two cases can arise: (B.3.1) There is no earlier (before O * , D * ) SQED instruction pair that write to x (cid:138) , x (cid:138)« ; (B.3.2) There is an earlier SQED instruction pair O , , D , that write to x (cid:138) , x (cid:138)« that is the last pair O * , D * have RAW dependencies on for these operands. Next, assume that (B.3.1) holds. From Assumption-2 , 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) and 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138)« ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . From Constraint C-2, 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151)(cid:152) , x (cid:138)« ) . Thus, by transitivity of equality, 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . This contradicts (B) . Hence, in case (B.3.1) we also conclude that if an EDDI-V test fails, it is caused by a bug in the design. Finally, assume that (B.3.2) holds. From Assumption-2 , 𝑣𝑎𝑙(𝑠 (cid:151) ¢ﬂ , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) and 𝑣𝑎𝑙(𝑠 (cid:151) ⁄ﬂ , x (cid:138)« ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . From Eqn. (1), 𝑣𝑎𝑙(𝑠 (cid:151) ¢ﬂ , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄ﬂ , x (cid:138)« ) , and then by transitivity, 𝑣𝑎𝑙(𝑠 (cid:151) ¢£ }4 , x (cid:138) ) = 𝑣𝑎𝑙(𝑠 (cid:151) ⁄£ }4 , x (cid:138)« ) . This contradicts (B) . Thus, for case (B.3.2), we also conclude that if an EDDI-V test fails, it is caused by a bug in the design. We have shown that in each of the six possible mutually disjoint cases, an EDDI-V test failure can only be caused by a bug in the design. This proves Theorem 1. ∎ A PPENDIX C: SPECIFYING QED CONSTRAINTS DURING BMC

In this Appendix, we describe in detail how we specify the QED constraints (see Section III.B) to the BMC tool.

Specifying C-1.

Constraint C-1 is naturally satisfied in processors with in-order execution, which is the case for processors with in-order pipelines. But, this is not the case with OoO cores. This is due to instruction indirection, i.e., renaming of instructions using respective ROB entries to support OoO execution. When starting at a symbolic state, ROB entry locations for SIF instructions may be chosen by the BMC tool such that SIF instructions commit after QED instructions, thereby violating C-1. So, specifying C-1 just requires constraining ROB entries for SIF instructions to avoid the issue.

Specifying C-2.

Constraint C-2 requires that test modes such as scan which can bypass instructions and write directly to the architectural state need to be turned off at T C (otherwise, spurious counterexamples can occur). Assuming there is a Test )(cid:157)*¤(cid:176)) signal to turn on/off each scan or other test mode, we can specify C-2 using the T C recorder (Fig. 5) and the below: ↑ .SIF (cid:152)‡“·(cid:176))(cid:181)) / ⇒ ∀(i, j) ∈ Λ • , .R n == R (cid:138) / , ↑ .SIF (cid:152)‡“·(cid:176))(cid:181)) / ⇒ ∀(m, n) ∈ Λ ‚ , (M “ == M (cid:157) ) , ↑ (Clock) ⇒ if .SIF (cid:152)‡“·(cid:176))(cid:181)) /, Test )(cid:157)*¤(cid:176)) == 0 . Above, ↑ (Signal (cid:157)*“) ) is true on any clock edge where Signal (cid:157)*“) transitions from low to high; Λ • is the set of all mapped (original, duplicate) pairs of registers; R n is the value held in register i ; Λ ‚ is the set of all mapped (original, duplicate) pairs of memory locations; and M “ is the value held in memory location m . For the experiments in Section IV, these statements were specified in the form of System-Verilog assume statements . Note that these constraints take the same form for both in-order and OoO cores. Specifying C-3.

Constraint C-3(ii) is vacuously true for in-order pipelines, as an instruction only makes progress when all its operands have already read their respective data. Otherwise, the instruction just stalls for operand data. We use the T C (Fig. 5) and SQED operand (Fig. 6) recorders to specify C-3(i): ↑ .SIF (cid:152)‡“·(cid:176))(cid:181)) / ⇒ ∀s ∈ src1_buffer (s. data == R ¿.*++( ), ↑ .SIF (cid:152)‡“·(cid:176))(cid:181)) / ⇒ ∀s ∈ src2_buffer (s. data == R ¿.*++( ), ↑ .SIF (cid:152)‡“·(cid:176))(cid:181)) / ⇒ ∀m ∈ mem_buffer (m. data == M “.*++( ) . Above, s.data gives the data stored for entry s in the buffer and s.addr gives the address. R n is the value held in register i ; and M “ is the value held in memory location m . For an OoO core, specifying C-3(i) is the same as above, but also checks additional ROB data in the Symbolic QED operand recorder (see Section III.C). Finding Counterexamples Using BMC.

The property (see Section II.B) used by BMC to find counter-examples in Symbolic QED is modified, using the T C recorder (Fig. 5) to support symbolic starting states: ↑ .QED ()*+, && SIF (cid:152)‡“·(cid:176))(cid:181)) / ⇒ 1 Ra == Ra′ *∈{4,…,U} Here, the only change is the addition of the

SIF (cid:152)‡“·(cid:176))(cid:181)) pre-condition to the QED property. R

EFERENCES [1]

H. D. Foster, “Trends in functional verification: A 2014 industry study,” Proc.

DAC , 2015. [2]

S. T. King et al ., “Designing and implementing malicious hardware,” in

Proc.

LEET , 2008. [3]

R. Karri et al ., “Trustworthy hardware: identifying and classifying hardware Trojans,”

Computer , 43(10):39-46, 2010. [4]

T. F. Wu et al ., “TPAD: hardware Trojan prevention and detection for trusted integrated circuits,”

IEEE Trans. CAD , 35(4):521-534, Apr. 2016. [5]

X. Zhang and M. Tehranipoor, “Case study: detecting hardware Trojans in third-party digital IP cores,” in Proc.

HOST , 2011. [6]

E. Singh et al ., “Logic bug detection and localization using symbolic quick error detection,”

IEEE Trans. CAD , 2018. [7] D. Lin et al ., “Effective post-silicon validation of system-on-chips using quick error detection,”

IEEE Trans. CAD , 33(10):1573-1590, Oct. 2014. [8]

E. Clarke et al ., “Bounded model checking using satisfiability solving,”

Formal Methods in System Design , 19(1):7-34, July 2001. [9]

E. Singh et al ., “Symbolic QED pre-silicon verification for automotive microcontroller cores: industrial case study,” in Proc.

DATE , 2019. [10]

A. Reid et al ., “End-to-end verification of processors with ISA-Formal,” in Proc.

CAV , 2016. [11]

J. Zhang et al ., “Detrust,” Proc.

CCCS , 2014. [12]

J. V. Rajendran, V. Vedula, and R. Karri, “Detecting malicious modifications of data in third-party intellectual property cores,” in Proc.

DAC , 2015. [13]

T. Reece and W. H. Robinson, “Detection of hardware Trojans in third-party intellectual property using untrusted modules ,” IEEE Trans. CAD , 35(3):357-366, July 2016. [14]

M. R. Fadiheh et al ., “Symbolic quick error detection using symbolic initial state for pre-silicon verification,” in Proc.

DATE , 2018. [15] “RIDECORE,” https://github.com/ridecore/ridecore. [16] “V-scale,” https://github.com/ucb-bar/vscale. [17]

A. Biere et al ., “Symbolic model checking without BDDs,” in Proc.

TACAS , 1999. [18] “Aquarius,” opencores.org/projects/aquarius. [19]

K. Yeager, “The MIPS R10000 superscalar microprocessor,”

IEEE Micro , 16(2):28-41, Apr. 1996. [20] “Cortex-A15,” https://tinyurl.com/y75sejwz. [21] “Issue: bugs in rs_mul,” https://tinyurl.com/y8otzyxb. [22] “Privileged Architecture,” https://tinyurl.com/y8jgqqza. [23]

M. K. Ganai, A. Gupta, and P. Ashar, “Efficient modeling of embedded memories in bounded model checking,” in Proc.

CAV , 2004. [24]

M. K. Ganai and A. Gupta, “Accelerating high-level bounded model checking,” in Proc.

ICCAD , 2006. [25]

J. Bhadra et al ., “A survey of hybrid techniques for functional verification,”

IEEE Design & Test , 24(2):112-122, June 2007. [26]

M. Thalmaier et al ., “Analyzing k-step induction to compute invariants for SAT-based property checking,” in Proc . DAC , 2010. [27]

J. R. Burch and D. Dill, “Automatic verification of pipelined microprocessor control,” in Proc . ICCAD , 1994. [28]

S. Berezin et al ., “Combining symbolic model checking with uninterpreted functions for out-of-order processor verification,” in Proc.

FMCAD , 1998. [29]

E. Singh et al ., “E-QED: electrical bug localization during post-silicon validation enabled by quick error detection and formal methods,” in Proc.

CAV , 2017. [30]

K. Xiao et al ., “Hardware Trojans: lessons learned after one decade of research,”

TODAES , 22(1), May 2016. [31]

B. Cakir and S. Malik, “Hardware Trojan detection for gate-level ICs using signal correlation-based clustering,” in Proc . DATE , 2015. [32]

M. Hicks et al ., “Overcoming an untrusted computing base: detecting and removing malicious hardware automatically,” in Proc.

SSP , 2010. [33]

J. Zhang et al ., “VeriTrust,” in Proc.

DAC , 2013. [34]

A. Waksman et al ., “FANCI: Identification of stealthy malicious logic using Boolean functional analysis,” in Proc.

CCCS , 2013. [35]

S. Yao et al ., “FASTrust,” in Proc.

ITC , 2015. [36]

J. Zhang et al ., “On hardware Trojan design and implementation at register-transfer level,” in Proc.

HOST , 2013. [37]

H. Salmani, M. Tehranipoor, and R. Karri, “On design vulnerability analysis and trust benchmarks development,” in Proc.

ICCAD , 2013. [38]

M. Banga and M. S. Hsiao, “Trusted RTL: Trojan detection methodology in pre-silicon designs,” in Proc.

HOST , 2010. [39]

G. Shrestha and M. S. Hsiao, “Ensuring trust of third-party hardware design with constrained sequential equivalence checking,” in Proc.

IEEE Conf. on Tech. for Homeland Security , 2012. [40]

X. Guo et al ., “Eliminating the hardware-software boundary: a proof-carrying approach for trust evaluation on computer systems,”

IEEE Trans. Inf. Forensics Sec. , 12(2):405-417, Feb. 2017. [41]

Y. Jin and Y. Makris, “A proof-carrying based framework for trusted microprocessor IP,” in Proc.

ICCAD , 2013. [42]

E. Love, Y. Jin, and Y. Makris, “Proof-carrying hardware intellectual property: A pathway to trusted module acquisition,”

IEEE Trans. Inf. Forensics Security , 7(1):25-40, Feb. 2012. [43]

N. Fern, I. San, and K. T. T. Cheng, “Detecting hardware Trojans in unspecified functionality through solving satisfiability problems,” in Proc.

ASP-DAC , 2017. [44]

W. Hu et al ., “Detecting hardware Trojans with gate-level information-flow tracking,”

Computer , 49(8):44-52, Aug. 2016. [45]

Y. Jin and Y. Makris, “Proof carrying-based information flow tracking for data secrecy protection and hardware trust,” in Proc.

VTS , 2012. [46]

J. V. Rajendran et al ., “Formal Security Verification of Third-Party Intellectual Property Cores for Information Leakage,” in Proc.

Intl. Conf. on VLSI Design , 2016. [47]

R. S. Chakraborty and S. Bhunia, “HARPOON: an obfuscation-based SoC design methodology for hardware protection,”

IEEE Trans. CAD , 28(10):1493-1502, Oct. 2009. [48]

S. Dupuis et al ., “A novel hardware logic encryption technique for thwarting illegal overproduction and hardware Trojans,” in Proc.

IOLTS , 2014. [49]

M. S. Samimi et al ., “Hardware enlightening: nowhere to hide your Hardware Trojans!” in Proc.

IOLTS , 2016. [50]

A. Al-Anwar et al ., “Hardware Trojan protection for third party IPs,” in Proc.

Euromicro Conf. on Digital Sys. Design , 2013. [51]

S. S. Ali et al ., “Multi-level attacks: An emerging security concern for cryptographic hardware,” in Proc.

DATE , 2011. [52]

H. Amin et al ., “System-level protection and hardware Trojan detection using weighted voting,”

J. Advanced Research , 5(4):499-505, July 2014. [53]

J. Backer et al ., “Reusing the IEEE 1500 design for test infrastructure for security monitoring of Systems-on-Chip,” in Proc.

DFT , 2014. [54]

M. Banga and M. S. Hsiao, “A novel sustained vector technique for the detection of hardware Trojans,” in Proc.

VLSI Design , 2009. [55]

M. Banga and M. S. Hsiao, “ODETTE: A non-scan design-for-test methodology for Trojan detection in ICs,” in Proc.

HOST , 2011. [56]

A. Baumgarten et al ., “A case study in hardware Trojan design and implementation,”

Intl. J. Info. Security , 10(1):1-14, Feb. 2011. [57]

S. Bhasin et al ., “Hardware Trojan horses in cryptographic IP cores,” in Proc.

FDTCS , 2013. [58]

S. Bhunia et al ., “Protection against hardware trojan attacks: Towards a comprehensive solution,”

IEEE Design & Test , 30(3):6-17, June 2013. [59]

M.-M. Bidmeshki and Y. Makris, “Toward automatic proof generation for information flow policies in third-party hardware IP,” in Proc.

HOST , 2015. [60]

M.-M. Bidmeshki et al ., “Data secrecy protection through information flow tracking in proof-carrying hardware IP—part II: framework automation,”

IEEE Trans. Info. Forensics and Sec. , 12(10):2430-2443, Feb. 2017. [61]

C. Bobda et al ., “Hardware sandboxing: a novel defense paradigm Against hardware Trojans in systems on chip,” in Proc.

ARC , 2017. [62]

R. S. Chakraborty et al ., “MERO: A statistical approach for hardware Trojan detection,” in Proc.

CHES , 2009. [63]

R. S. Chakraborty et al ., “Security against hardware Trojan attacks using key-based design obfuscation,”

J. Elect. Test , 27(6):767–785, Dec. 2011. [64]

X. Chen et al ., “Hardware Trojan detection in third-party digital intellectual property cores by multilevel feature analysis,”

IEEE Trans. CAD , F. Farahmandi, Y. Huang, and P. Mishra, “Trojan localization using symbolic algebra,” Proc.

ASP-DAC , 2017. [66]

N. Fern, S. Kulkarni, and K. T. T. Cheng, “Hardware Trojans hidden in RTL don’t cares - automated insertion and prevention methodologies,” in Proc.

ITC , 2015. [67]

N. Fern et al ., “Hiding hardware Trojan communication channels in partially specified SoC bus functionality,”

IEEE Trans. CAD,

X. Guo et al ., “Automatic RTL-to-Formal code converter for IP security formal verification,” in Proc.

MTV , 2016. [69]

S. K. Haider et al ., “Hatch: A formal framework of hardware Trojan design and detection,”

Univ. Connecticut, Cryptol. ePrint Arch., Tech. Rep

S. K. Haider, C. Jin, and M. Dijk, “Advancing the state-of-the-art in hardware Trojans design,” in Proc.

MWSCAS , 2017. [71]

S. K. Haider et al ., “Advancing the state-of-the-art in hardware Trojans detection,”

IEEE Trans. Dependable and Secure Comp. , 16(1):18-32, Jan. 2019. [72]

D. H. K. Hoe et al ., “Towards secure analog designs: a secure sense amplifier using memristors,” in Proc.

ISVLSI , 2014. [73]

N. Hu, M. Ye, and S. Wei, “Surviving information leakage hardware Trojan attacks using hardware isolation,”

IEEE Trans. Emerging Topics in Computing , 7(2):253-261, Apr. 2019. [74]

S. Jha and S. K. Jha, “Randomization based probabilistic approach to detect Trojan circuits,” in Proc.

HASE , 2008. [75]

Y. Jin et al ., “Experiences in hardware Trojan design and implementation,” in Proc.

HOST , 2009. [76] Y. Jin et al ., “Data secrecy protection through information flow tracking in proof-carrying hardware IP—Part I: framework fundamentals,”

IEEE Trans. Info. Forensics and Sec. , 12(10):2416-2429, Feb. 2017. [77]

V. Jyothi et al ., “TAINT: tool for automated INsertion of Trojans,” in Proc.

ICCD , 2017. [78]

S. Kan and J. Dworak, “Triggering Trojans in SRAM circuits with X-propagation,” in Proc.

DFTS , 2014. [79]

C. Krieg et al ., “A process for the detection of design-level hardware Trojans using verification methods,” in Proc.

HPCC , 2014. [80]

C. Krieg et al ., “Malicious LUT: a stealthy FPGA Trojan injected and triggered by the design flow,” in Proc.

ICCAD , 2016 [81]

C. Krieg et al ., “Toggle MUX: How X-optimism can lead to malicious hardware,” in Proc.

DAC , 2017. [82]

A. Kulkarni et al ., “SVM-based real-time hardware Trojan detection for many-core platform,” in Proc.

ISQED , 2016. [83]

N. Lesperance, S. Kulkarni, and K. T. Cheng, “Hardware Trojan detection using exhaustive testing of k-bit subspaces,” in Proc.

ASP-DAC , 2015. [84]

H. Liu, H. Luo, and L. Wang, “Design of hardware Trojan horse based on counter,” in Proc.

Intl. Conf. on QRRMS Engineering , 2011. [85]

B. Liu and R. Sandhu, “Fingerprint-based detection and diagnosis of malicious programs in hardware,”

IEEE Trans. Reliability , 64(3):1068-1077, Sept. 2015. [86]

S. Mal-Sarkar et al ., “Design and validation for FPGA Trust under hardware Trojan attacks,”

IEEE Trans. Multi-Scale Comp. Sys. , 2(3):186-198, Sept. 2016. [87]

M. Muehlberghuber et al ., “Red team vs. blue team hardware Trojan analysis,” in Proc.

HASP , 2013. [88]

S. Narasimhan, R. S. Chakraborty, and S. Chakraborty, “Hardware IP protection during evaluation using embedded sequential Trojan,”

IEEE Design & Test , 29(3):70-79, June 2012. [89]

M. Rathmair, F. Schupfer, and C. Krieg, “Applied formal methods for hardware Trojan detection,” in Proc.

ISCAS,

T. Reece, D. B. Limbrick, and W. H. Robinson, “Design comparison to identify malicious hardware in external intellectual property,” in Proc.

ICESS , 2011. [91]

H. Salmani, “COTD: reference-free hardware Trojan detection and recovery based on controllability and observability in gate-level netlist,”

IEEE Trans. Info. Forensics and Sec. , 12(2):338-350, Feb. 2017. [92]

B. Schneier, “Evil maid attacks on encrypted hard drives,”

Crypto-Gram Newsletter , 2009. [93]

A. Sengupta and S. Bhadauria, “Untrusted third party digital IP cores: Power-delay trade-off driven exploration of hardware Trojan secured datapath during high level synthesis,” in Proc.

GLSVLSI , 2015. [94]

A. Sengupta, S. Bhadauria, and S. P. Mohanty, “TL-HLS: methodology for low cost hardware Trojan security aware scheduling with optimal loop unrolling factor during high level synthesis,”

IEEE Trans. CAD , A. Sengupta and D. Roy, “Protecting IP core during architectural synthesis using HLT-based obfuscation,”

Elect. Lett. , 53(13):849-851, June 2017. [96]

B. Shakya et al ., “Benchmarking of hardware Trojans and maliciously affected circuits,”

J. HSS , 1(1):85-102, June 2017. [97]

D. M. Shila and V. Venugopal, “Design, implementation and security analysis of hardware Trojan threats in FPGA,” in Proc.

ICC , 2014. [98]

O. Sinanoglu et al ., “Reconciling the IC test and security dichotomy,” in Proc. th European Test Symposium , 2013. [99]

C. Sturton et al ., “Defeating UCI: building stealthy and malicious hardware,” in Proc.

SSP , 2011. [100]

D. Sullivan et al ., “FIGHT-Metric: functional identification of gate-level hardware trustworthiness,” in Proc.

DAC , 2014. [101]

N. G. Tsoutsos et al ., “Advanced techniques for designing stealthy hardware Trojans,” in Proc.

DAC , 2014. [102]

N. Veeranna and B. C. Schafer, “Hardware Trojan detection in behavioral intellectual properties (IP's) using property checking techniques,”

IEEE Trans. Emerg. Topics in Comp. , 5(4):577-585, Oct. 2017. [103]

A. Waksman and S. Sethumadhavan, “Tamper evident microprocessors,” in Proc.

SSP , 2010. [104]

A. Waksman and S. Sethumadhavan, “Silencing hardware backdoors,” in Proc.

SSP , 2011. [105]

X. Wang et al ., “Sequential hardware Trojan: side-channel aware design and placement,” in Proc.

ICCD , 2011. [106]

X. Wang et al ., “Software exploitable hardware Trojans in embedded processor,” in Proc.

Intl. Symp. on DFT , 2012. [107]

X. Wang et al .,“IIPS: infrastructure IP for secure SoC design,”

IEEE Trans. Computers , 64(8):2226-2238, Aug. 2015. [108]

S. Wei et al ., “Hardware Trojan horse benchmark via optimal creation and placement of malicious circuitry,” in Proc.

DAC , 2012. [109]

S. Wei et al ., “Provably complete hardware Trojan detection using test point insertion,” in Proc.

ICCAD , 2012. [110]

S. Wei and M. Potkonjak, “Malicious circuitry detection using fast timing characterization via test points,” in Proc.

HOST , 2013. [111]

J. Zhang et al ., “VeriTrust: verification for hardware trust,”

IEEE Trans. CAD , 34(7):1148-1162, July 2015. [112]

E. Zhou et al ., “Nonlinear Analysis for Hardware Trojan Detection,” in Proc.

ICSPCC , 2015. [113]

I. H. Abbassi et al ., “TrojanZero: switching activity-aware design of undetectable hardware Trojans with zero power and area footprint,” in Proc.

DATE , 2019. [114]

Q. A. Ahmed et al ., “Proof-carrying hardware versus the stealthy malicious LUT hardware Trojan,” in Proc.

ARC , 2019. [115]

K. Balasubramanian et al ., “Effect of sign-bit-flipping Trojan on turbo coded communication systems,” in Proc.

ICDCN , 2019. [116]

Z. Chen et al ., “Toward FPGA security in IoT: a new detection technique for hardware Trojans,

IEEE IoT Journal , 6(4):7061-7068, Aug. 2019. [117]

J. Clements and Y. Lao, “Hardware Trojan design on neural networks,” in Proc.

ISCAS , 2019. [118]

T. Dhar et al ., “Hardware Trojan detection by stimulating transitions in rare nets,” in Proc.

VLSID , 2019. [119]

C. Dong et al ., “A multi-layer hardware Trojan protection framework for IoT chips,”

IEEE Access , 7:23628-23639, Feb. 2019. [120]

R. Elnaggar et al ., “Hardware Trojan detection using changepoint-based anomaly detection techniques,”

IEEE Trans. VLSI , July 2019. [121]

N. Fern and K.-T. T. Cheng, “Evaluating assertion set completeness to expose hardware Trojans and verification blindspots,” Proc.

DATE , 2019. [122]

A. P. Fournaris et al ., “An efficient multi-parameter approach for FPGA hardware Trojan detection,”

Microproc. and Microsyst. , 71:102863, Aug. 2019. [123]

S. Ghandali et al ., “Temperature-based hardware Trojan for ring-oscillator-based TRNGs,” arXiv:1910.00735 , Sept. 2019. [124]

K. Guha et al ., “Criticality based reliability against hardware Trojan attacks for processing of tasks on reconfigurable hardware,”

Microproc. and Microsyst. , 71:102685, Aug. 2019. [125]

X. Guo et al ., “QIF-Verilog: quantitative information-flow based hardware description languages for pre-silicon security assessment,” Proc.

HOST , 2019. [126]

T. Han et al ., “Hardware Trojans detection at register transfer level based on machine learning,” in Proc.

HOST , 2019. [127]

J. He et al ., “SoC interconnection protection through formal verification,”

Integration, the VLSI Journal , 64:143-151, Jan. 2019. [128]

P. Swierczynski et al ., “Interdiction in practice—hardware Trojan against a high-security USB flash drive,”

J. Crypt. Eng. , 7(3):199-211, Oct. 2017. [129]

K. Huang et al ., “Holistic hardware Trojan design of trigger and payload at gate level with rare switching signals eliminated,”

IEICE Elec. Exp. , 16(4):1-6, Aug. 2019. [130]

M. N. I. Khan et al ., “Hardware Trojans in emerging non-volatile memories,” in Proc.

DATE , 2019. [131]

C. Kison et al ., “Security implications of intentional capacitive crosstalk,”

IEEE Trans. Info. Forensics Sec. , 14(12): 3246-3258, Dec. 2019. [132]

N. Veeranna and B. C. Schafer, “S3CBench: synthesizable security systemC benchmarks for high-level synthesis,”

J. Hardw. Syst. Secur ., 1(2):103-113, Aug. 2017. [133]

H. M. Le et al ., “Detection of hardware Trojans in SystemC HLS designs via coverage-guided fuzzing,” in Proc.

DATE , 2019. [134]

Y. Liu et al ., “Hardware Trojan detection leveraging a novel golden layout model towards practical applications,”

J. Electronic Testing , 35(4):529-541, Aug. 2019. [135]

Y. Lyu and P. Mishra, “Efficient test generation for Trojan detection using side channel analysis,” in Proc.

DATE , 2019. [136]

A. Malekpour, “Hardware Trojan detection and recovery in MPSoCs via on-line application specific testing,” in Proc.

DDECS , 2019. [137]

K. Nagarajan et al ., “ENTT: a family of emerging NVM-based Trojan triggers,” in Proc.

HOST , 2019. [138]

L. N. Nguyen et al ., “Creating a backscattering side channel to enable detection of dormant hardware Trojans,”

IEEE Trans. VLSI , 27(7):1561-1574, July 2019. [139]

C. Pilato et al ., “High-level synthesis of benevolent Trojans,” in Proc.

DATE , 2019. [140]

M. Qin et al ., “Theorem proof-based gate level information flow tracking for hardware security verification,”

Comp. & Sec ., 35:225-239, Aug. 2019. [141] O. Ranjbar et al ., “A unified approach to detect and distinguish hardware Trojans and faults in SRAM-based FPGAs,”

J. Elect. Testing , 35(2):201-214, Apr. 2019. [142]

V. Y. Raparti et al ., “Lightweight mitigation of hardware Trojan attacks in NoC-based manycore computing,” in Proc.

DAC , 2019. [143]

M. Shayan et al ., “Hardware Trojans inspired hardware IP watermarks,”

IEEE Design & Test , July 2019. [144]

K. S. Subramani et al ., “Demonstrating and mitigating the risk of an FEC-based hardware Trojan in wireless networks,”

IEEE Trans. Inf. Forensics Sec , 14(10):2720-2734, Oct. 2019. [145]

Y. Tang et al., “Activity factor based hardware Trojan detection and localization,”

J. Electronic Testing , 35(3):293-302, 2019. [146]

H. Thapliyal and Z. Kahleifeh, “Solving energy and cybersecurity constraints in IoT devices using energy recovery computing,” in Proc.

GLSVLSI , 2019. [147]

J. Wang et al ., “A benchmark suite of hardware Trojans for on-chip networks,” 7:102002-102009,

IEEE Access , Aug. 2019. [148]

Y. Wang et al ., “Ensemble-learning-based hardware Trojans detection method by detecting the trigger nets,” in Proc . ISCAS , 2019. [149]

M. Xue et al ., “Building an accurate hardware Trojan detection technique from inaccurate simulation models and unlabelled ICs,”

IET Comp. & Digital Tech ., 13(4):348-359, July 2019. [150]

S. Yu et al ., “An improved automatic hardware Trojan generation platform,” in Proc . ISVLSI , 2019. [151]

Y. Zhao et al ., “Memory Trojan attack on neural network accelerators,” in Proc . DATE , 2019. [152]

J. Cruz et al ., “An automated configurable Trojan insertion framework for dynamic trust benchmarks,” in Proc . DATE , 2018. [153]

X. Wang et al ., “Detecting malicious inclusions in secure hardware: challenges and solutions,” in Proc.

HOST , 2008. [154]

M. Abramovici and P. Bradley, “Integrated circuit security: new threats and solutions,”

CSIIRW , no. 55, 2009. [155]

M. Tehranipoor and F. Koushanfar, “A survey of hardware Trojan taxonomy and detection,”

IEEE Design & Test , 27(1):10–25, Jan. 2010. [156]

M. Tehranipoor et al ., “Trustworthy hardware: Trojan detection and design-for-trust challenges,”

Computer , (7):66-74, July 2011. [157] D. Saha and S. Sur-Kolay, “SoC: a real platform for IP reuse, IP infringement, and IP protection,”

VLSI Design , 2011. [158]

M. Rostami et al ., “Hardware security: threat models and metrics,” in Proc.

ICCAD,

S. Bhunia et al ., “Hardware Trojan attacks: threat analysis and countermeasures,”

Proc. IEEE , 102(8):1229-1247, Aug. 2014. [160]

J. Rajendran, O. Sinanoglu, and R. Karri, “Regaining trust in VLSI design: design-for-trust techniques,”

Proc. IEEE , 102(8):1266-1282, Aug. 2014. [161]

M. Rostami et al ., “A primer on hardware security: models, methods, and metrics,”

Proc. IEEE , 102(8):1283-1295, Aug. 2014. [162]

C. Liu et al ., “Shielding heterogeneous MPSoCs from untrustworthy 3PIPs through security-driven task scheduling,”

IEEE Trans. Emerging Topics in Comp. , 2(4):461-472, Dec. 2014. [163]

J. Rajendran et al ., “Fault analysis-based logic encryption,”

IEEE Trans. Computers , 64(2):410-424, Feb. 2015. [164]

J. Francq and F. Frick, “Introduction to hardware Trojan detection methods,” in Proc.

DATE , 2015. [165]

S. Bhasin and F. Regazzoni, “A survey on hardware Trojan detection techniques,” in Proc.

ISCAS , 2015. [166]

J. Rajendran et al ., “Belling the CAD: toward security-centric electronic system design,”

IEEE Trans. CAD , 34(11):1756-1769, Nov. 2015. [167]

C. Bao, D. Forte, and A. Srivastava, “On reverse engineering-based hardware Trojan detection,”

IEEE Trans. CAD,

S. Amir et al ., “Comparative analysis of hardware obfuscation for IP protection,” in Proc.

GLSVLSI , 2017. [169]

P. Mishra et al ., Hardware IP Security and Trust . Springer, 2017. [170]

C. Pilato et al ., “Securing hardware accelerators: a new challenge for high-level synthesis,”

IEEE Embed. Syst. Lett. , 10(3):77-80, Sept. 2018. [171]

A. Sengupta et al ., “Low cost functional obfuscation of reusable IP cores used in CE hardware through robust locking,”

IEEE Trans. CAD , 38(4):604-616, Apr. 2019. [172]

J. Knechtel et al ., “3D integration: another dimension toward hardware security,” in Proc.

IOLTS , 2019. [173]

H. Salmani et al . “Special session: countering IP security threats in supply chain,” in Proc.

VTS , 2019. [174]

E. Sarkar and M. Maniatakos, “On automating delayered IC analysis for hardware IP protection,” in Proc.

VTS , 2019. [175]

S. Sidhu et al ., “Hardware security in IoT devices with emphasis on hardware Trojans,”

J. Sensor Actuator Networks , 8(3):1-19, Aug. 2019. [176]

X. Guo et al ., “When capacitors attack: formal method driven design and detection of charge-domain Trojans,” in Proc.

DATE , 2019. [177]

G. Gielen et al ., “Review of methodologies for pre-and post-silicon analog verification in mixed-signal SOCs,” in Proc.

DATE , 2019. [178]

C. Baier and J.-P. Katoen,

Principles of Model Checking . MIT Press, 2008. [179]

M. Sipser,

Introduction to the Theory of Computation . Cengage Learning, 2013. [180] “The RISC-V Instruction Set Manual,” https://tinyurl.com/yajxsfng. [181] “OpenRISC 1000 Architecture Manual,” https://tinyurl.com/yyvglmyt. [182]

F. Lonsing et al ., “Unlocking the Power of Formal Hardware Verification with CoSA and Symbolic QED,” in Proc.

ICCAD , 2019. [183]

M. Oya et al ., “A score-based classification method for identifying hardware-trojans at gate-level netlists,” in Proc.

DATE , 2015. [184]

S. M. Sebt et al ., “An efficient technique to detect stealthy hardware Trojans independent of the trigger size,”

J. Elect. Test , Dec. 2019. [185]

K. Devarajegowda et al ., “Gap-free processor verification by S ² QED and property generation,” in Proc.