[PDF] Open-Source Verification with Chisel and Scala

Abstract

Performance increase with general-purpose processors has come to a halt. We can no longer depend on Moore's Law to increase computing performance. The only way to achieve higher performance or lower energy consumption is by building domain-specific hardware accelerators. To efficiently design and verify those domain-specific accelerators, we need agile hardware development. One of the main obstacles when proposing such a modern method is the lack of modern tools to attack it. To be able to verify a design in such a time-constrained development method, one needs to have efficient tools both for design and verification. This paper thus proposes ChiselVerify, an open-source tool for verifying circuits described in any Hardware Description Language. It builds on top of the Chisel hardware construction language and uses Scala to drive the verification using a testing strategy inspired by the Universal Verification Methodology (UVM) and adapted for designs described in Chisel. ChiselVerify is created based on three key ideas. First, our solution highly increases the productivity of the verification engineer, by allowing hardware testing to be done in a modern high-level programming environment. Second, the framework functions with any hardware description language thanks to the flexibility of Chisel blackboxes. Finally, the solution is well integrated into the existing Chisel universe, making it an extension of currently existing testing libraries. We implement ChiselVerify in a way inspired by the functionalities found in SystemVerilog. This allows one to use functional coverage, constrained-random verification, bus functional models, transaction-level modeling and much more during the verification process of a design in a contemporary high-level programming ecosystem.

Full PDF

OOpen-Source Veriﬁcation with Chisel and Scala

Andrew Dobis, Tjark Petersen, Kasper Juul Hesse Rasmussen, Enrico Tolotto,Hans Jakob Damsgaard, Simon Thye Andersen, Richard Lin, Martin Schoeberl

Department of Applied Mathematics and Computer ScienceTechnical University of Denmark

Lyngby, Denmark

Department of Electrical Engineering and Computer SciencesUC Berkeley

Berkeley, [email protected]ﬂ.ch, [email protected], [email protected], [email protected],[email protected], [email protected], [email protected], [email protected]

Abstract —Performance increase with general-purpose proces-sors has come to a halt. We can no longer depend on Moore’sLaw to increase computing performance. The only way to achievehigher performance or lower energy consumption is by buildingdomain-speciﬁc hardware accelerators. To efﬁciently design andverify those domain-speciﬁc accelerators, we need agile hardwaredevelopment. One of the main obstacles when proposing such amodern method is the lack of modern tools to attack it. To beable to verify a design in such a time-constrained developmentmethod, one needs to have efﬁcient tools both for design andveriﬁcation.This paper thus proposes ChiselVerify, an open-source toolfor verifying circuits described in any Hardware DescriptionLanguage. It builds on top of the Chisel hardware constructionlanguage and uses Scala to drive the veriﬁcation using a testingstrategy inspired by the Universal Veriﬁcation Methodology(UVM) and adapted for designs described in Chisel. ChiselVerifyis created based on three key ideas. First, our solution highlyincreases the productivity of the veriﬁcation engineer, by al-lowing hardware testing to be done in a modern high-levelprogramming environment. Second, the framework functionswith any hardware description language thanks to the ﬂexibilityof Chisel blackboxes. Finally, the solution is well integrated intothe existing Chisel universe, making it an extension of currentlyexisting testing libraries.We implement ChiselVerify in a way inspired by the func-tionalities found in SystemVerilog. This allows one to use func-tional coverage, constrained-random veriﬁcation, bus functionalmodels, transaction-level modeling and much more during theveriﬁcation process of a design in a contemporary high-levelprogramming ecosystem.

Index Terms —digital design, veriﬁcation, Chisel, Scala

I. I

NTRODUCTION

We can no longer depend on Moore’s Law to increase com-puting performance [9]. Performance increase with general-purpose processors has come to a halt. The only way toachieve higher performance or lower energy consumption isby building domain-speciﬁc hardware accelerators [7]. Theseaccelerators can be built in chips or in FPGAs in the cloud.The production of a chip is costly. Therefore, it is essential to get the design right at the ﬁrst tape-out. Thorough testing andveriﬁcation of the design is mandatory.To efﬁciently develop and verify those accelerators, we canlearn from software development trends such as agile softwaredevelopment [6]. We believe that we need to adapt to agilehardware development [11].Furthermore, as accelerators become part of the cloud ser-vice, i.e. FPGAs in the cloud, software developers will increas-ingly need to adapt critical algorithms to FPGAs to enhanceperformance. Hence, it is imperative to make accelerator de-sign accessible for software developers. By adapting hardwareaccelerator design to the methods and tools of contemporarysoftware design, it is possible to bridge both domains cateringfor a more uniform hardware/software development process.Up until a few years ago, the two main design languages,Verilog and VHDL, dominated the design and testing of digitalcircuits. However, compared to software development andtesting, digital design and testing methods/tools lack severaldecades of development. We propose a tool that leveragessoftware development and testing methods for digital design.Based in the hardware construction language Chisel [4], whichitself is embedded in Scala, our framework reimagines func-tionalities form Universal Veriﬁcation Method (UVM) withSystemVerilog [1] and adapts them for the existing Chiselecosystem.We thus developed a method and concrete tools for ag-ile hardware development. ChiselVerify combines tools, lan-guages, development, and testing methods from the lastdecades in software development and applies them to hardwaredesign. We aim to raise the tooling level for a digital designto increase productivity. The workload demanded for theveriﬁcation (testing) of digital systems is about double thetime of developing them in the ﬁrst place.Using the power and ﬂexibility of Chisel Blackboxes, ourtool can be used to verify designs implemented in any of themajor hardware description languages (i.e., VHDL or Verilog)with little overhead. Furthermore, golden models described in a r X i v : . [ c s . P L ] F e b ny programming language can be used using the Java JNI.Our work builds upon existing open-source projects and is thusalso open-source.We developed an object-oriented and functional frameworkfor veriﬁcation in Scala. This framework is inspired by UVM,but leverages Scala’s conciseness with the combination ofobject-oriented programming with functional programming.An initial experiment of testing the accumulator circuit of theLeros processor [16] showed the that a test written with UVMwas about 800 lines of code, where a Scala based test wasaround 80 lines of code [15]. However, UVM supports morefunctionality that a plain ChiselTest in Scala.Within our veriﬁcation framework, we support mixed lan-guage veriﬁcation. Verilog can easily be combined with Chisel,as Chisel generates Verilog, and we use ChiselTest as a driverfor the open-source Verilog simulator Verlator. With Yosyssynthesis suite [18] and GHDL [10] we can translate VHDLinto Verilog.A veriﬁcation method is only usable when it can handlemixed-source designs. This means a Scala driven method mustbe able to test components written in Verilog, VHDL, andSystemVerilog.Chisel has support for black boxes, which allows the useof Verilog code within the Chisel design. Therefore, it isrelatively easy to integrate Verilog components when wrappedinto a black box. However, this forces Chisel to use Verilatorinstead of Treadle to run the simulation, impacting startuptime.Chisel does not fully support VHDL. It can support VHDLusing VCS, but there is no open-source solution available forVHDL simulation. For companies with a lot of source codewritten in VHDL this is a concern, as they must be able tointegrate their existing IP in a Scala/Chisel based design andveriﬁcation workﬂow. All major commercial simulation andsynthesis tools support mixed-language designs, but no open-source tools exist that provide the same functionality.To alleviate this issue, we use the open-source Yosyssynthesis suite [18]. Yosys is an open-source digital hardwaresynthesis suite for Verilog. Yosys also has a variety of plugins,one of these being a plugin for using GHDL [10], an open-source VHDL simulator. By using Yosys in conjunction withGHDL, VHDL ﬁles are compiled to an RTL-based intermedi-ate representation, which is then written to a Verilog ﬁle usingYosys. GHDL has full support for IEEE 1076 VHDL 1987,1993, 2002, and a subset of 2008. A working solution namedVHDL2Verilog has been made for this, which has been testedwith certain simple VHDL designs [2].This paper is an extension of [15].In the following sections, we will explore the differentbackgrounds on which ChiselVerify was based.II. B ACKGROUND AND S TATE - OF - THE -A RT VHDL and Verilog are the classic hardware descriptionlanguages, ﬁrst appeared in the 1980s. SystemVerilog [1], asan extension to Verilog, adds features from VHDL for the hardware description and object-oriented features for veriﬁ-cation. Recent advances with SystemVerilog and Chisel [4],[14] have brought object-oriented programming into the digitaldesign and veriﬁcation process.Chisel is a “Hardware Construction Language” embeddedin Scala, to describe digital circuits [4]. Scala/Chisel bringsobject-oriented and functional programming into the world ofdigital design. For hardware generation and testing, the fullScala language and Scala and Java libraries are available. AsScala and Java’s full power is available to the veriﬁcationengineer, the veriﬁcation process is also made more efﬁcient.Chisel is a hardware construction language embedded inScala. Chisel allows the user to write hardware generatorsin Scala, an object-oriented and functional language. Forexample, we read in the string based schedules for a network-on-chip [17] and convert them with a few lines of Scala codeinto a hardware table to drive the multiplexer of the router andthe network interface.Chisel is solely a hardware construction language, and thusall valid Chisel code maps to synthesizable hardware. Byseparating the hardware construction and hardware veriﬁcationlanguages, it becomes impossible to write non-synthesizablehardware and in turn, speeds up the design process.Chisel and Scala are executing on the Java virtual machineand therefore have a very good interoperability with Java.Therefore, we can leverage a large pool of Java librariesfor hardware design and veriﬁcation. Furthermore, the namespace of packets in Scala/Java simpliﬁes integration of externalcomponents. Open source hardware components in Chisel canbe organized like software libraries at Maven servers.SystemVerilog adds object-oriented concepts for the non-synthesizable veriﬁcation code. The SystemVerilog directprogramming interface [8] allows the programmer to callC functions inside a SystemVerilog (UVM) testbench. Thisenables co-simulation with a “golden model” written in C,and the testbench verifying the device under test (DUT). WithChiselTest we can co-simulate with Java and Scala models anduse the Java Native Interface to co-simulate with models in C.The Universal Veriﬁcation Method (UVM) is an open sourcecollection of SystemVerilog, which is becoming popular inindustry. SystemVerilog has become a complex language withmore than 250 keywords, and it is unclear which toolssupport which language constructs for hardware description.In contrast with Chisel, when the program compiles, it issynthesizable hardware. Chisel is a small language, wherethe cheat sheet ﬁts on two pages. The power of Chiselcomes from the embedding in Scala. Furthermore, as classichardware description languages are niche products, not manytools or libraries are available. With Chisel on Scala we havethe choice of different integrated development environments(IDE), testing infrastructure (e.g., ScalaTest), and many freelibraries.The Java JNI (Java Native Interface) allows for a simi-lar functionality in Java programs, allowing them to call Cfunctions and use their functionality. By using Scala, whichis built on Java, it is our hope to use the JNI together withcala’s test frameworks. The aim is to develop a framework forco-simulation with Scala/Chisel testers and a C-based goldenmodel. This should allow companies to keep their existing Cmodels, but move their simulation workﬂow into Scala/Chiseltesters.The digital design described in Chisel can be tested andveriﬁed with ChiselTest [12], a non-synthesizable testingframework for Chisel. ChiselTest emphasizes usability andsimplicity while providing ways to scale up complexity. Fun-damentally, ChiselTest is a Scala library that provides accessinto the simulator through operations like poke (write valueinto circuit), peek (read value from circuit, into the testframework), and step (advance time). As such, tests writtenin ChiselTest are just Scala programs, imperative code thatruns one line after the next. This structure uses the latest pro-gramming language developments that have been implementedinto Scala and provides a clean and concise interface, unlikeapproaches that attempt to reinvent the wheel like UVM.Furthermore, ChiselTest tries to enable testing best practicesfrom software engineering. Its lightweight syntax encourageswriting targeted unit tests by making small tests easy. A clearand clean test code also enables the test-as-documentationpattern, demonstrating a module’s behavior from a temporalperspective.III. C

ONSTRAINT R ANDOM V ERIFICATION

The complexity of digital design is growing with the ca-pacity of the silicon. A decade ago, the industry started tomove away from “direct” testing towards functional coverageand formal methods. One of the pillars of functional veri-ﬁcation is constraint programming. Constraint programming(CP) is a programming paradigm that has been developedsince the mid-1980s and emerged as a further developmentof logic programming. Constraint-based programming allowsconstraints and their solution mechanisms to be integrated intoa programming language. With constraint programming, theuser describes the problem in a declarative way, while thesolution process takes a back seat from the user’s perspective.A subset of these problems is the so-called Constraint Sat-isfaction Problems (CSP), which are mathematical problemsdeﬁned as a set of objects such that their state must satisfyseveral constraints. CSP represents the entities of a problemas a ﬁnite homogeneous collection of constraints. typedef enum {UNICAST=11,MULTICAST,BROADCAST}pkt_type; class frame_t; rand pkt_type ptype; rand integer len; rand bit [7:0] payload []; constraint common { payload.size() == len; } // Constraint the members constraint unicast { len <= 2; ptype == UNICAST; } // Constraint the members constraint multicast { len >= 3; len <= 4; ptype == MULTICAST; } endclass Listing 1. Random object in SystemVerilog

Listing 1 shows a class named ”frame t”. It uses the ”rand”keyword for variables ”len”, ”ptype”, and payload. There-fore these are the variables that can be randomized. Thenconstraints to these variables are applied and declared bythe “common” ”unicast,” and ”multicast” constraint groups.Each class in SystemVerilog has an intrinsic method called”randomize(),” which causes new values to be selected for allthe variables declared with the rand keyword. The selectedvalue for each variable will respect the constraints appliedto it. If there are “rand” variables that are unconstrained, arandom value inside their domain will be assigned. Combiningrandom classes using the inheritance OOP paradigm allows thecreation of general-purpose models that can be constrained toperform domain-speciﬁc functions. In the research process, aCSP solver was implemented in Scala based on the methoddescribed in [13]. The implementation is composed of twomain components. The ﬁrst one is the CSP solver itself,which uses a combination of backtracking and arc consistencyto generate solutions for well-deﬁned problems. The secondcomponent is a small DSL, which allows users to declare andrandomize objects. object pktType extends SVEnumeration { val UNICAST: Value = Value(11) val MULTICAST: Value = Value(0) val BROADCAST: Value = Value(1) val domainValues = { values.map(x => x.id).toList } } class Frame extends

Random { import pktType._ var pType: RandInt = rand(pType, pktType.domainValues()) var len: RandInt = rand(len, 0 to 10 toList) var noRepeat: RandCInt = randc( noRepeat, 0 to 1 toList) var payload: RandInt = rand(payload, 0 to 7 toList) val common = constraintBlock ( binary ((len, payload) => len == payload) ) val unicast = constraintBlock( unary (len => len < = unary (pType => pType == UNICAST.id) ) val multicast = constraintBlock( unary (len => len > = unary (len => len < = unary (pType => pType == MULTICAST.id) ) } Listing 2. Random object in Scala

Listing 2, shows an example of random a object. Contraryto SystemVerilog, to declare a random object, the user has toextend the class from the Random base-class provided by thelibrary. After that each random variable has to be declared oftype ”RandInt” and initialized with the ”rand” macro. Finally,like for SystemVerilog, in Scala inheriting the Random baseclass exposes the method ”random” which assigns randomvalues to the random ﬁelds of the class.IV. V

ERIFICATION OF

AXI4 I

NTERFACED C OMPONENTS

Another solution to the ever-increasing complexity of digitaldesigns is to use standardized interfaces, which enable greaterreuse. One of such standard interfaces is AXI4, an openstandard by ARM [3], which is used in particular to connectprocessor nodes to memories. As such, most available syn-thesis tools, including Xilinx’ Vivado, provide IP generatorswhose output IP blocks are equipped with AXI interfacesalong with optional veriﬁcation structures written in (System-)Verilog [19].Typically, veriﬁcation of components with such standardinterfaces is provided through so-called bus functional models (BFMs) that abstract complex low-level signal transitionsbetween bus masters and slaves to a transaction level (e.g.,write and read transactions). Unfortunately, such BFMs are notyet available in Chisel – hence, why we include an exampleBFM based around ChiselTest in our framework.

A. Introduction to AXI4

The Advanced eXtensible Interface protocol by ARM isa highly ﬂexible interconnect standard based around ﬁveseparate channels; three for write operations and two forread operations. Operations, known as transactions, consistof a number of transfers across either set of channels. Allchannels share a common clock and active-low reset andbase their transfers on classic ready-valid handshaking. Itis designed with DMA-based memories in focus supportingmultiple outstanding transactions and out-of-order completion.The ﬁve channels are: • Write Address for transferring transaction attributes frommaster to slave • Write Data for transferring write data and strobe frommaster to slave • Write Response for transferring transaction status of awrites from slave to master • Read Address same as

Write Address , but for reads • Read Data for transferring read data from slave to masterConsider for example a write transaction of 16 data ele-ments. First, the master provides transaction attributes (e.g.,target address, burst length, and data size) as a single transferover the

Write Address channel, then the master transfers the 16 data elements one at a time over the

Write Data channel,and ﬁnally, the slave indicates the status of the transaction overthe

Write Response channel. The

Read Address and

Read Data channels may operate independently at the same time. A fulldescription is available in [3].

B. Implementation

Our implementation includes bundles deﬁnining the ﬁvedifferent channels, abstract classes representing both masterand slave entities, transaction-related classes, and of coursethe BFM itself; the

FunctionalMaster class. The BFM isparameterized with a DUT that extends the slave class andprovides a simple, transaction level interface to control theDUT. As such, its two most important public methods are createWriteTrx and createReadTrx which do exactlyas their names indicate; create and enqueue write and readtransactions.Internally, the BFM makes use of ChiselTest’s multithread-ing features to allow for (a) non-blocking calls to the afore-mentioned methods (i.e., one can enqueue multiple transac-tions without waiting for their completion) and (b) emulatingthe channel independence more closely. As such, when, forexample, a write transaction is enqueued and no other writetransactions are in-ﬂight, the BFM spawns three new threads,one for each required channel. The threads each handle thehandshaking necessary to operate the channels.

C. A Simple Example

Returning to the example used before, using the BFM to testa module called

Memory is as simple as shown below. Creatinga write transaction with 16 data elements (minimum burstlength is 1, hence len = 15 means a burst of 16 items) takesjust one call to a method the majority of whose argumentshave default values. It is equally simple to create a subsequentread transaction – but beware that due to the BFM’s parallelexecution style, the channels are indeed independent. As such,not waiting for a write to complete before starting to read fromthe same address may return incorrect results depending on theimplementation of the DUT. class MemoryTester extends

FlatSpec with

ChiselScalatestTester with

Matchers { behavior of "My(cid:32)Memory(cid:32)module" it should "write(cid:32)and(cid:32)read" in { test( new Memory()) { dut => val bfm = new FunctionalMaster(dut) master.createWriteTrx(0,Seq.fill(16)(0x7FFFFFFF), len = = master.createReadTrx(0, len =

15, size = } } } Listing 3. Using the AXI4 BFM with ChiselTest . C

OVERAGE IN C HISEL

One of the main tools used in veriﬁcation is test coverage.This allows veriﬁcation engineers to measure their progressthroughout the testing process and have an idea of howeffective their tests actually are. Coverage can be separatedinto two distinct categories: code coverage and functionalcoverage. code coverage deﬁnes a quantitative measure of thetesting progress, ”How many lines of code have been tested?” ,whereas functional coverage gives a rather qualitative measure, ”How many functionalities have we tested?” . Our solutiongives the veriﬁcation engineer access to two ways of obtainingtheir code coverage and new constructs allowing the deﬁnitionof a veriﬁcation plan and the creation of a functional coveragereport directly integrated into the Chisel testing framework. a) Code Coverage with Treadle:

The ﬁrst part of oursolution is about code coverage, more speciﬁcally line cover-age that was added to the Treadle FIRRTL execution engine.Treadle is a common FIRRTL execution engine used tosimulate designs implemented in Chisel. This engine runs onthe FIRRTL intermediate representation code generated bya given Chisel implementation and allows one to run user-deﬁned tests on the design using frameworks like iotesters or the more recent testers2 . In our pursuit of creating averiﬁcation framework, we found that one way to obtain linecoverage would be to have our framework run on an extendedversion of Treadle that was capable of keeping track of saidinformation.The solution that was used to implement line coverage wasbased off of a method presented by Ira. D. Baxter [5]. The ideais to add additional outputs for each multiplexer in the design.These new ports, which we will call

Coverage Validators , areset depending on the paths taken by each multiplexer andthat information is then gathered at the end of each test andmaintained throughout a test suite. Once the testing is done,we used the outputs gathered from the

Coverage Validators to check wether or not a certain multiplexer path was takenduring the test, all of this resulting in a branch coveragepercentage.This was implemented in Treadle by creating a custompass of the FIRRTL compiler that traverses the AbstractSyntax Tree (

AST ) and adds the wanted outputs and coverageexpressions into the source tree. Once that is done, the

TreadleTester samples those additional outputs every timethe expect method is called and keeps track of their valuesthroughout a test suite. Finally it generates a Scala caseclass containing the following coverage information: • The multiplexer path coverage percentage. • The coverage Validator lines that were covered by a test. • The modiﬁed LoFIRRTL source code in the form of a

List[String] .The

CoverageReport case class can then be serialized, givingthe following report:

COVERAGE: 50.0% of multiplexer paths testedCOVERAGE REPORT: + circuit Test_1 :+ module Test_1 :+ input io_a : UInt<1>+ input io_b_0 : UInt<2>+ input io_b_1 : UInt<2>+ input clock : Clock+ output io_cov_valid_0 : UInt<1>+ output io_cov_valid_1 : UInt<1>+ output out : UInt<2>++ io_cov_valid_0 <= io_a- io_cov_valid_1 <= not(io_a)+ out <= mux(io_a, io_b_0, io_b_1)

The example above is taken for a simple test, where we areonly testing the path where in a is 1. This means that, sincewe only have a single multiplexer, only half of our brancheshave been tested and we would thus want to add a test for thecase where in a is 0. The report can thus be interpreted asfollows: • ” + ” before a line, means that it was executed in at leastone of the tests in the test suite. • ” - ” before a line, means that it wasn’t executed in anyof the tests in the test suite.Treadle thus allows us to obtain coverage at the FIRRTLlevel. A more interesting result would be if the FIRRTL linecoverage would be mapped to the original Chisel source. Thisis possible but challenging, since Treadle only has access tothe original source code through Source locators which mapsome of the FIRRTL lines back to Chisel. This means thatthe code can only be partially mapped and the remainder willhave to be reconstructed using some smart guessing. b) Functional Coverage Directly in Scala:

FunctionalCoverage is on the principal tools used during the veriﬁcationprocess, since it allows one to have a measurement of ”howmuch of the speciﬁcation has been implemented correctly” . Averiﬁcation framework would thus not be complete withoutconstructs allowing one to deﬁne a veriﬁcation plan andretrieve a functional coverage report. The main language usedfor functional coverage is

SystemVerilog , which is why oursolution is based on the same syntax. There are three maincomponents to deﬁning a veriﬁcation plan: • Bin : Deﬁnes a range of values that should be tested for(i.e. what values can we expect to get from a given port). • CoverPoint : Deﬁnes a port that needs to be sampled inthe coverage report. These are deﬁned using a set of bins. • CoverGroup : Deﬁnes a set of

CoverPoint s that need tobe sampled at the same time.Using the above elements, one can deﬁne what’s known as averiﬁcation plan, which tells the coverage reporter what portsneed to be sampled in order to generate a report. In order toimplement said elements in Scala we needed to be able to dothe following:

Deﬁne a veriﬁcation plan (using constructs similar to coverpoint and bins ). • Sample DUT ports (for example by hooking into the

Chisel Testers2 framework). • Keep track of bins to sampled value matches (using a sortof DataBase). • Compile all of the results into a comprehensible CoverageReport.Implementing these elements was done using a structurewhere we had a top-level element, known as our

CoverageReporter which allows the veriﬁcation engineer to deﬁnea veriﬁcation plan using the register method, which it-self stores the coverpoint to bin mappings inside of our CoverageDB . Once the veriﬁcation plan is deﬁned, we cansample our ports using the sample method, which is doneby hooking into

Chisel Testers2 in order to use its peekingcapabilities. At the end of the test suite a functional coveragereport can be generated using the printReport method,which shows us how many of the possible values, deﬁnedby our bin ranges, were obtained during the simulation. val cr = new CoverageReporter cr.register( //Declare CoverPoints //CoverPoint 1 CoverPoint(dut.io.accu , "accu", Bins("lo10", 0 to 10):: Bins("First100", 0 to 100) ::Nil):: //CoverPoint 2 CoverPoint(dut.io.test, "test", Bins("testLo10", 0 to 10) ::Nil):: Nil, //Declare cross points Cross("accuAndTest", "accu", "test", CrossBin("both1", 1 to 1, 1 to 1) ::Nil):: Nil)

The above code snippet is an example of how to deﬁne averiﬁcation plan using our coverage framework. The conceptsare directly taken from

SystemVerilog , so it should be acces-sible to anyone coming from there. One concept, that is usedin the example veriﬁcation plan, which we haven’t presentedyet is the idea of

Cross Coverage deﬁned using the

Cross construct.

Cross Coverage allows one to specify coveragerelations between CoverPoints. This means that a cross deﬁnedbetween, let’s say, coverpoint a and coverpoint b will beused to gather information about when a and b had certainvalues simultaneously. Thus in example veriﬁcation plan weare checking that accu and test take the value 1 at the sametime.Once our veriﬁcation plan is deﬁned, we need to decide whenwe want to sample our cover points. This means that at somepoint in our test, we have to tell our CoverageReporter tosample the values of all of the points deﬁned in our veriﬁcationplan. This can be done, in our example, simply by calling cr.sample() when we are ready to sample our points. Finallyonce our tests are done, we can ask for a coverage report by calling cr.printReport() which results in the followingcoverage report: ============== COVERAGE REPORT ============================== GROUP ID: 1 ================COVER_POINT PORT NAME: accuBIN lo10 COVERING Range 0 to 10 HAS 8 HIT(S)BIN First100 COVERING Range 0 to 100 HAS 9 HIT(S)============================================COVER_POINT PORT NAME: testBIN testLo10 COVERING Range 0 to 10 HAS 8 HIT(S)============================================CROSS_POINT accuAndTest FOR POINTS accu AND testBIN both1 COVERING Range 1 to 1 CROSS Range 1 to 1HAS 1 HIT(S)============================================

An other option would be, for example if we want to doautomated constraint modiﬁcations depending on the currentcoverage, to generate the coverage as a Scala case class and then to use it’s binNcases method to get numerical andreusable coverage results.One ﬁnal element that our framework offers is the pos-sibility to gater delayed coverage relationships between twocoverage points. The idea is similar to how a

Cross works,but this time rather than sampling both points in the samecycle, we rather look at the relation between one point at thestarting cycle and an other point sampled a given number ofcycles later. This number of cycles is called the delay andthere are currently three different ways to specify it: • Exactly delay, means that a hit will only be consideredif the second point is sampled in its range a given numberof cycles after the ﬁrst point was. • Eventually delay, means that a hit will be consideredif the second point is sampled in its range at any pointwithin the following given number of cycles after the ﬁrstpoint was. • Always delay, means that a hit will be considered if thesecond point is sampled in its range during every cyclefor a given number of cycles after the ﬁrst point wassampled.VI. U SE C ASE : H

ARDWARE SORTING

In the process of the research a use case provided byMicrochip was implemented in order to apply developedtesting and veriﬁcation features. In the following section theimplementation of the use case and the connected testing willbe discussed. The code can be found in the project repository.

A. Speciﬁcation

The provided speciﬁcation document describes a priorityqueue which can be used in real time systems for the schedul-ing of deadlines by providing information about the next timerexpiration to the host system. Sorting of the enqueued elementsis conducted by applying the heap sort algorithm. Elements arestructured in a so-called heap which is a tree data structure.The tree needs to be balanced in order for the timer closesto expiring to get to the top of the tree and thus to the headof the queue. This means verifying that every parent node issmaller than the connected child nodes.The log k N deepness of the tree provides good scalabilityin terms of insertion and removal times when the queue sizeincreases, since worst case log k N − k is thenumber of child elements per parent and N is the numberof elements in the heap. A trade-off is the varying delayconnected to the rebalancing of the tree where the queue isunresponsive. If queuing happens in bursts, a buffer could beadded. Here the introduced delay from insertion request toactual appearance of the enqueued value in the heap of courseneeds to be taken into account.In order for the host system to have the ability to distinguishbetween multiple consecutive super cycles and clock cyclesin a super cycle, the values inserted into the queue are splitinto the ﬁelds cyclic and normal priority (time out value). Theremoval functionality of the queue requires a reference system.A reference ID is therefore given together with the element atinsertion, where ID generation and uniqueness are handled bythe host system. B. Implementation

The implemented priority queue is described in chisel. Itis split into 3 modules: The

Heapifier , responsible for thesorting, the

QueueControl , taking care of the general controlﬂow in the queue and the

Memory module which handlesmemory accesses and can search the memory for a speciﬁcreference ID.In order for the priority queue to work efﬁciently it is crucialto optimize memory accesses. Therefore a layout is proposedin the speciﬁcation where all child elements of a certain nodeare stored together under one memory address. This allowssingle memory access fetches of all k children. Since the rootnode has no siblings it is stored alone in a register. This enableseven faster access in certain scenarios which are discussed lateron.One memory row contains k elements each consisting ofthe 3 introduced ﬁelds: cyclic priority, normal priority and theconnected reference ID. In the implemented Memory modulea single sequential memory is instantiated where masking isused to be able to over-write speciﬁc elements in one memoryrow.There are a variety of solutions to the problem of contentaddressability which is required here in order to ﬁnd positionsof elements in the heap by providing their reference ID.Cache-like memory relying on parallel tag comparison couldbe used to achieve fast and constant search times. On theother hand, the existing sequential memory could be searchedlinearly, where k reference ID’s are compared in parallel untilthe searched reference ID is found. A compromise betweenthe two solutions could include splitting memory space overmultiple instances of the latter and thus reducing worst casesearch time. The priority queue is designed to be independent "Warm Up" phase 1for heapify up "Warm Up" phase 2for heapify upread next parent swap required?-> write back childrenelse go idle write back parentreached end?-> go idleelse continue"Warm Up" phase 1for heapify down "Warm Up" phase 2for heapify downread next children swap required?-> write back parentelse go idle write back childrenreached end?-> go idleelse continueidle Fig. 1. The state diagram of the

Heapifier idle wait for heapify upwait for heapifydownwait for searchinit search andcheck if tail or headmatch reset tail cell overwrite removedelement with tailremove headremove tail remove the lastelement in thequeue insert the firstelement in thequeue append new elementto end of queue

Fig. 2. The state diagram of the

QueueControl of the speciﬁc implementation. As a reference, the linearsearch is implemented.The

Heapifier loops from a given starting point in the treeeither upwards or downwards until it hits the root or a childlessnode. In each iteration it is checked whether the parent elementis smaller than its child elements and if not a swap occurs.Once the parent element and child elements of the startingpoint are fetched from memory, only the next parent or blockof children respectively needs to be fetched depending on thelooping direction (up/down). Thus only 3 cycles (1 fetch, 2write backs) are required per swap. The state diagram of the

Heapifier is shown in Figure 1.The task of the

QueueControl is to insert or removeelements and then signalize to the

Heapfier to balance thetree. As it can be seen in Figure 2, there are a series of specialcases where insertion or removal times can be reduced forinstance by taking advantage of the head element being savedin a register. The achieved best and worst case insertion aswell as removal times are presented in Table I. ead insertion 2 cyclesNormal insertion min 7 cycles and max 5 + · log k ( N ) Head removal min 8 cycles and max 6 + · log k ( N ) Tail removal 3 cyclesNormal removal min 12 cycles + search timemax 13 + · log k ( N ) + search time ( N = queued elements, k = number of child elements per node) TABLE IB

EST AND WORST CASE INSERTION AND REMOVAL TIMES

Size k = k = k = k = IMULATED INSERTION / REMOVAL TIMES . C. Testing and Veriﬁcation

All modules and the fully assembled queue were testedwith random stimuli in scala by employing the ChiselTestframework. In order to check whether the dut matched thespeciﬁcation, reference models were written for each module.Most modules could be modelled by a single or multiplefunctions. As a reference model for the whole priority queue,a class was written which simulates state and interaction on aoperation based level. In order to abstract interaction with theDUT, wrapper classes were employed. These make it easy tothink on a transaction or operation based level when writingtests.In the test of the priority queue, purely random pokes areproduced. In order to evaluate how well these pokes werespread over the spectrum of possible or interesting input com-binations, the developed functional coverage feature is used.This allows to evaluate whether interesting or important edgecases are reached by the random input sequence. Furthermore,metrics on how many of the generated pokes actually are validoperations, are collected. The average insertion and removaltimes measured under a random test run are shown in tableVI-B. These numbers are of course heavily dependent onaccess patterns and are as such only representative for thecompletely random test case used here. The tests can be foundin the project repository.VII. UVM, S

YSTEM V ERILOG AND V ERIFICATION

This section serves as a reference on what the UVM can do,how it works, and how some of this (namely the SystemVerilogDirect Programming Interface (DPI)) may be implemented ina Scala based testing framework.The Universal Veriﬁcation Methodology (UVM) is a veri-ﬁcation framework built on top of the testing capabilities ofSystemVerilog. Prior to the introduction of the UVM, the ma-jor EDA simulation vendors supported different frameworks/ methodologies. This meant that a veriﬁcation environmentdesigned for one simulator might not run under another sim-ulator. The main purpose of the UVM was to standardize thetesting framework used across EDA vendors, making it easier for users to use their testbenches across different softwaresuites.The testbench in the UVM is built up around a numberof components. Each component performs only one taskin the testbench, allowing the engineer to make changesto some components without affecting others. For example,the sequencer is the component responsible for generatingtransactions for the DUT, whereas the driver is responsiblefor converting the transaction into pin-level wiggles, i.e.,generating correct start/stop conditions and driving signals. Ifa new sequence of transactions is to be generated, only thesequencer is affected. Likewise, the sequencer does not carehow the transactions are converted into pin-level signals—this is the sole responsibility of the driver. This separationinto several components results in a more structured testbenchdesign as there are fewer dependencies than in a monolithictestbench.The main components of a UVM testbench are as follows:A

Sequence(r): deﬁnes the order of transactions necessaryfor a given purpose, e.g., synchronization or reset sequence.The sequencer is responsible for transferring the transactions,deﬁned by the sequence, to the driver.A

Driver converts transactions into pin-level signals anddrives these signals onto the DUT.An

Interface is a SystemVerilog construct which allowsthe user to group related signals. A DUT may have severalinterfaces attached. The interface is used to avoid hookingdirectly into the DUT, making it easier to test multiple DUTversions.A

Monitor monitors all trafﬁc on the interface, convertingpin-level signals into transaction-level objects that can beoperated on by other components, such as a coverage collectoror scoreboard.An

Agent encapsulates monitor, sequencer and driver, set-ting conﬁguration values. Agents may be set active or passive(with or without a driver and sequencer). An agent is usefulwhen it is necessary to have multiple instances of the samecomponents, e.g., when a 4-port network switch needs fouridentical drivers with different conﬁgurations.A

Scoreboard is used to check whether correct functionalityis achieved. Usually does so by using a “golden model”for co-simulation via the SystemVerilog direct programmingInterface.The

Environment is used to conﬁgure and instantiate allchild components. Environments are typically application-speciﬁc and may be modiﬁed by the test.The

Test is the top-level veriﬁcation component. The testdesigner may choose to perform factory overrides of classesand set conﬁguration values here, which modify the childcomponents.As shown above, even a “Hello, World” example using theUVM requires that the user understands how and why each ofthe different UVM components should be used. The use of somany components causes UVM to have a very steep learningcurve, which may discourage adoption. This also means thatUVM is not the proper testing methodology for small designsr one-off tests due to the initial workload. However, once theinitial setup of the testbench is ﬁnished for large and complexdesigns, generating new tests becomes easier.

A. Scoreboard and DPI

The purpose of the scoreboard is to ensure that the DUTmatches speciﬁcation. When using directed tests (i.e. hand-written tests meant to test a single part of the speciﬁcation),this may be as simple as constructing a list of input/outputvalues and checking these in order. When using randomizedtesting, the scoreboard is usually implemented as a softwaremodel (sometimes called a golden model) which is deﬁnedto be correct. The software model should exactly mimic thefunctionality of the DUT, and thus provide the same outputsgiven the same inputs.The scoreboard may be implemented purely in SystemVer-ilog, or it may be implemented in a C-like language (C, C++ orSystemC). One of the beneﬁts that SystemVerilog adds to theveriﬁcation environment is the ability to interface with C-codethrough the use of the SystemVerilog Direct ProgrammingInterface (DPI). This is an interface which allows C code to becalled from inside the SystemVerilog testbench, and likewiseallows for SystemVerilog code to be called from inside of a Cprogram. Programming low-level simulations of hardware ismost likely easier in C than in plain SystemVerilog. In listing4 is shown a simple example of some SystemVerilog codecalling C code through the SystemVerilog DPI.Notice the inclusion of the header ﬁle svdpi.h , whichcontains the deﬁnitions necessary for interoperability. Oncethis is done, the function must be imported in SystemVerilogby use of the import "DPI-C" statement, after which thefunction may be called as any other SystemVerilog function.Using the DPI is surprisingly simple and painless, and makesit very simple to integrate a C model in the testbench.It should be noted that a scoreboard doesn’t necessarily”rate” the performance of the DUT by comparing it to othermodules, as one might expect from the name. The DUT isonly compared against the reference model, and the ”rating”is how many assertions pass/fail in a given test.

B. Constrained Random Veriﬁcation

Most UVM testbenches employ ”Constrained Random Ver-iﬁcation” (CRV) for generating test stimulus. This is asopposed to using directed testing, where input vectors arepredeﬁned to test a certain behaviour. With CRV, random teststimuli are generated. The ”Constrained” part of CRV impliesthat these values aren’t entirely random, but are chosen tofulﬁll a series of criteria. If eg. the DUT is an ALU witha 5-bit opcode ﬁeld of which only 20 bit-patterns are used,it would be prudent to only generate the 20 valid opcodesfor most test purposes. Speciﬁc tests could then be written toensure proper operation if an invalid opcode was asserted.An example of a class with randomized variables is seen inlisting 5.The keyword constraint is used to constrain a variabledeﬁned with the rand keyword. One randomized variable, // hello.c "svdpi.h" void c_hello( char * name) { printf("Hello(cid:32)from(cid:32)C,(cid:32)%s", name); } // hello.sv module tb; import "DPI-C" function void c_hello(string name); initial main(); task main(); c_hello("SystemVerilog"); endtask endmodule // Outputs: "Hello from C, SystemVerilog" Listing 4. Short example showing how to use the SystemVerilog DPI toexecute C-code from within SystemVerilog. //Myclass.svh class Myclass; rand logic [3:0] bcd; rand logic [7:0] value; rand logic [3:0] a, b, c; rand logic [1:0] op; constraint c_BCD { bcd inside {[0:9]}; } constraint c_value { value dist { [1:254]:/1, }; } constraint c_abc { a < b; b < c; } // op isn’t constrained, but is randomized endclass: Myclass Listing 5. Example SystemVerilog code showing how different values areconstrained. bcd , is a 4-bit ﬁeld which only takes on values in the range 0and 9. The ﬁeld value has a 1 / / / a,b,c must satisfy 0 < a < b < c . The ﬁeld op has no constraints,and will thus randomly toggle between 0, 1, 2 and 3, the onlyvalues it can take on.In listing 6, an object of type Myclass is instantiated andrandomized using the mc.randomize() command. Finally,the SystemVerilog function $ display is used to print thevalue of the BCD ﬁeld. The randomize keyword is a Sys-temVerilog construct which will try to randomize all randomﬁelds of given class. //top.sv ‘include "Myclass.svh" ‘include "Cover.svh" module top; Myclass mc; Cover cov; initial begin; mc = new; cov = new; cov.mc = mc; for( int i=0; i<20; i++) begin mc.randomize(); cov.sample(); end end endmodule Listing 6. Example SystemVerilog code showing how to instantiate andrandomize a class with random ﬁelds.

C. Coverage collection

The purpose of coverage collection, in this case functionalcoverage collection, is to check whether all ”necessary” (asdeﬁned by the veriﬁcation engineer) input combinations havebeen generated. For the ALU mentioned above, it might beinteresting to check whether all opcodes work correctly whenasserted after a reset, and whether over/underﬂow ﬂags arecorrectly set when performing arithmetic operations. In gen-eral, functional coverage is concerned with evaluating whetherall functionality of a device has been tested. This is opposedto line coverage, which evaluates which lines of code wererun during a simulation.An example of functional coverage collection is seen inlisting 7 where the randomized values from before are covered.A covergroup is a label used to collect relevant values underthe same heading. A coverpoint is a directive instructingthe simulator that the given value should be monitored forchanges. In the declaration of the coverpoint, several bins aregenerated. These bins correspond to the bins of a histogram.Every time an event occurs which matches the declarationinside a bin, the counter associated with that bin is incrementedby one. An event may cause multiple bins to increment at thesame time.In the coverage declarations shown in listing 7, the cover-group cg bcd only covers one ﬁeld, bcd . The bin is labeledby prepepending

BCD: in front of the coverpoint directive.10 bins are generated which each sample the values 0-9, andthe remaining values 10-15 are sampled into the default bin others .In the covergroup cg others three coverpoints are setup. The

VAL coverpoint samples 3 ranges of values. Anyvalue in the range [0:20] will cause the counter on bin low to increment by one. Likewise for the other bins inthat coverpoint. The A coverpoint auto-generates one bin foreach possible value it may take on, 16 bins in total, sinceno bins are explicitly declared. The coverpoint OP has onebin, toggle which only increments when mode toggles from to . Finally, the cross statement implements cross //Cover.svh class Cover; Myclass mc; covergroup cg_bcd; BCD: coverpoint mc.bcd { bins valid[] = {[0:9]}; bins others = default ; } endgroup: cg_bcd covergroup cg_others; VAL: coverpoint mc.value { bins low = {[0:20]}; bins high = {[235:255]}; bins bad = {[110:130]}; bins others = default ; } A: coverpoint mc.a; OP: coverpoint mc.op { bins toggle = (0 => 1); } cross A, OP; endgroup: cg_others function new(); cg_bcd = new; cg_others = new; endfunction: new function void sample(); cg_bcd.sample(); cg_others.sample(); endfunction: sample endclass: Cover Listing 7. Examle SystemVerilog code showing how covergrups andcoverpoints are organized. coverage. Cross coverage tracks what values were sampled atmultiple coverpoints at the same time.Using the cross of A and OP , it may be possible to have100% coverage on both coverpoints (ie. all bins have been hitat least once), but the cross coverage may not be at 100% ifeg. OP never toggled while A was 1. Increasing the number ofrandom samples that are generated may alleviate this problem.If it doesn’t, it may be indicative that something is wrong inthe structure of the testbench or DUT.In listing 8, the module from listing 6 has been expandedto also use the coverage collector.In Figure 3, the result of running the 20 iterations is seenfor coverpoints BCD, A, VAL . D. C Integration in Scala

The SystemVerilog DPI is a great addition to the veriﬁcationworkﬂow, as it allows the designer to easily implement a Cmodel into the SystemVerilog testbench. Similar functionalityis available in Scala by leveraging the Java Native Interface(JNI). This is an interface which allows Java / Scala programsto call native code, i.e. code compiled for the speciﬁc CPU itis running on. This is typically encountered as .DLL ﬁles onWindows or .so ﬁles on linux. //top.sv ‘include "Myclass.svh" ‘include "Cover.svh" module top; Myclass mc; Cover cov; initial begin; mc = new; cov = new; cov.mc = mc; for( int i=0; i<20; i++) begin mc.randomize(); cov.sample(); end end endmodule Listing 8. Showcasing how multiple random values are generated and sampledby the coverage collector.

Coverpoint BCD t h e r s Bins H i t s Coverpoint VAL low bad high others

Bins H i t s Coverpoint A

Bins H i t s Fig. 3. Coverage bins generated by running the code in listing 8

When using Chisel and other external libraries, it is recom-mended to use the Scala Build Tool (sbt) to manage the build.However, doing this makes it more difﬁcult to use the defaultJNI integration in Scala. This can be alleviated by use of theplugin sbt-jni by Jakob Odersky.Listing 9 shows an example of a Scala ﬁle with a nativefunction.The annotation @native informs the Scala compiler thatthe function should be found in native code. The annotation @nativeLoader is necessary for use with the plugin. Thename ”native0” is the name of the current SBT project,appended with a 0.Once the above ﬁle has been written, the corresponding Cheader ﬁle is generated by running sbt javah . The contents import ch.jodersky.jni.nativeLoader nativeLoader("native0") //use NativeLoader tosimplify code-reuse in other placesclassMyclass // --- Nativemethodsnative def digitsum(num: Int): Int native def hello(string: String): StringobjectMyclass // Main method to test our nativelibrarydef main(args: Array[String]): Unit =val mc = new MyClassval sum =mc.digitsum(1234)val string =mc.hello("Scala")println(s" string ” ) println ( s ” Digitsumis sum")//Outputs://Hellofrom C, Scala//Digit sum is 10

Listing 9. Example Scala code showing how to integrate native code in Scala. "MyClass.h" JNIEXPORT jint JNICALL Java_MyClass_digitsum (JNIEnv * env, jobject obj, jint num) { int i = num; int sum = 0; while (i) { sum += i % 10; i = i / 10; } return sum; } JNIEXPORT jstring JNICALL Java_MyClass_hello (JNIEnv * env, jobject obj, jstring string) { const char * str =(*env)->GetStringUTFChars(env, string, 0); char buf[128]; snprintf(buf, 128, "Hello(cid:32)from(cid:32)C,(cid:32)%s", str); (*env)->ReleaseStringUTFChars(env, string, str); return (*env)->NewStringUTF(env,buf); } Listing 10. C implementation of the methods declared in Myclass from listing9. of the generated header ﬁle should then be copied into a Ccode ﬁle, where the functions can be implemented. In thiscase, it may look like the contents of listing 10.Notice that primitives such as int can be autocast froma jint , whereas a string must be obtained used using thefunction pointers deﬁned in the

JNIEnv structure. Once thisis done, a CMake makeﬁle is generated by running sbt"nativeInit cmake" , and the C ﬁles are compiled using sbt nativeCompile . If all goes well, the code can then berun with sbt run .If changes are made to any of the function deﬁnitions orScala ﬁle, no other steps are necessary than running sbtrun again. This will also invoke the CMake script generationand compilation steps if necessary. If new native methodsare added to the Scala ﬁle, sbt javah must be run againto generate new C headers.For more information regarding the plugin setup, see theplugin page on Github or the ﬁle HowToJni.md in the docu-mentation repository.III. C

ONCLUSION

In this paper, we introduced ChiselVerify, an open-sourcesolution that should increase a veriﬁcation engineer’s produc-tivity by following the trend of moving towards a more high-level and software like ecosystem for hardware design. Withthis, we brought functional coverage, statement coverage, con-straint random veriﬁcation and transactional modelling to theChisel/Scala ecosystem, thus allowing for the improvement ofcurrent engineer’s efﬁciency and easing the way for softwareengineers to join the hardware veriﬁcation world.

Acknowledgment

This work has been performed as part of the “InﬁnIT– Innovationsnetværk for IT”, UFM case no. 1363-00036B,“High-Level Design and Veriﬁcation of Digital Systems”.R

EFERENCES[1] IEEE standard for SystemVerilog – uniﬁed hardware design, speciﬁca-tion, and veriﬁcation language, 2018.[2] S. Andersen. Vhdl2verilog. https://github.com/chisel-uvm/vhdl2verilog.[3] ARM. Amba axi and ace protocol speciﬁcation axi3, axi4, and axi4-liteace and ace-lite. https://developer.arm.com/documentation/ihi0022/e/.[4] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avizienis,J. Wawrzynek, and K. Asanovic. Chisel: constructing hardware in a scalaembedded language. In

The 49th Annual Design Automation Conference(DAC 2012)

Commun. ACM

Proceedingsof the 38th Annual International Symposium on Computer Architecture ,ISCA ’11, pages 365–376, New York, NY, USA, 2011. Association forComputing Machinery.[10] T. Gingold. Ghdl. https://github.com/ghdl/ghdl.[11] J. L. Hennessy and D. A. Patterson. A new golden age for computerarchitecture.

Commun. ACM , 62(2):48–60, Jan. 2019.[12] R. Lin. ChiselTest. https://github.com/ucb-bar/chisel-testers2.[13] S. Russell and P. Norvig.

Artiﬁcial intelligence: a modern approach .Prentice Hall, 2002.[14] M. Schoeberl.

Digital Design with Chisel . Kindle Direct Publishing,2019. available at https://github.com/schoeberl/chisel-book.[15] M. Schoeberl, S. T. Andersen, K. J. H. Rasmussen, and R. Lin. Towardsan open-source veriﬁcation method with chisel and scala. In

Proceedingsof the Third Workshop on Open-Source EDA Technology (WOSET) ,2020.[16] M. Schoeberl and M. Petersen. Leros: The return of the accumulatormachine. In M. Schoeberl, T. Pionteck, S. Uhrig, J. Brehm, andC. Hochberger, editors,

Architecture of Computing Systems - ARCS 2019- 32nd International Conference, Proceedings , pages 115–127. Springer,May 2019.[17] M. Schoeberl, L. Pezzarossa, and J. Sparsø. S4noc: a minimalisticnetwork-on-chip for real-time multicores. In