Open-Source Verification with Chisel and Scala
Andrew Dobis, Tjark Petersen, Kasper Juul Hesse Rasmussen, Enrico Tolotto, Hans Jakob Damsgaard, Simon Thye Andersen, Richard Lin, Martin Schoeberl
OOpen-Source Verification with Chisel and Scala
Andrew Dobis, Tjark Petersen, Kasper Juul Hesse Rasmussen, Enrico Tolotto,Hans Jakob Damsgaard, Simon Thye Andersen, Richard Lin, Martin Schoeberl
Department of Applied Mathematics and Computer ScienceTechnical University of Denmark
Lyngby, Denmark
Department of Electrical Engineering and Computer SciencesUC Berkeley
Berkeley, [email protected]fl.ch, [email protected], [email protected], [email protected],[email protected], [email protected], [email protected], [email protected]
Abstract —Performance increase with general-purpose proces-sors has come to a halt. We can no longer depend on Moore’sLaw to increase computing performance. The only way to achievehigher performance or lower energy consumption is by buildingdomain-specific hardware accelerators. To efficiently design andverify those domain-specific accelerators, we need agile hardwaredevelopment. One of the main obstacles when proposing such amodern method is the lack of modern tools to attack it. To beable to verify a design in such a time-constrained developmentmethod, one needs to have efficient tools both for design andverification.This paper thus proposes ChiselVerify, an open-source toolfor verifying circuits described in any Hardware DescriptionLanguage. It builds on top of the Chisel hardware constructionlanguage and uses Scala to drive the verification using a testingstrategy inspired by the Universal Verification Methodology(UVM) and adapted for designs described in Chisel. ChiselVerifyis created based on three key ideas. First, our solution highlyincreases the productivity of the verification engineer, by al-lowing hardware testing to be done in a modern high-levelprogramming environment. Second, the framework functionswith any hardware description language thanks to the flexibilityof Chisel blackboxes. Finally, the solution is well integrated intothe existing Chisel universe, making it an extension of currentlyexisting testing libraries.We implement ChiselVerify in a way inspired by the func-tionalities found in SystemVerilog. This allows one to use func-tional coverage, constrained-random verification, bus functionalmodels, transaction-level modeling and much more during theverification process of a design in a contemporary high-levelprogramming ecosystem.
Index Terms —digital design, verification, Chisel, Scala
I. I
NTRODUCTION
We can no longer depend on Moore’s Law to increase com-puting performance [9]. Performance increase with general-purpose processors has come to a halt. The only way toachieve higher performance or lower energy consumption isby building domain-specific hardware accelerators [7]. Theseaccelerators can be built in chips or in FPGAs in the cloud.The production of a chip is costly. Therefore, it is essential to get the design right at the first tape-out. Thorough testing andverification of the design is mandatory.To efficiently develop and verify those accelerators, we canlearn from software development trends such as agile softwaredevelopment [6]. We believe that we need to adapt to agilehardware development [11].Furthermore, as accelerators become part of the cloud ser-vice, i.e. FPGAs in the cloud, software developers will increas-ingly need to adapt critical algorithms to FPGAs to enhanceperformance. Hence, it is imperative to make accelerator de-sign accessible for software developers. By adapting hardwareaccelerator design to the methods and tools of contemporarysoftware design, it is possible to bridge both domains cateringfor a more uniform hardware/software development process.Up until a few years ago, the two main design languages,Verilog and VHDL, dominated the design and testing of digitalcircuits. However, compared to software development andtesting, digital design and testing methods/tools lack severaldecades of development. We propose a tool that leveragessoftware development and testing methods for digital design.Based in the hardware construction language Chisel [4], whichitself is embedded in Scala, our framework reimagines func-tionalities form Universal Verification Method (UVM) withSystemVerilog [1] and adapts them for the existing Chiselecosystem.We thus developed a method and concrete tools for ag-ile hardware development. ChiselVerify combines tools, lan-guages, development, and testing methods from the lastdecades in software development and applies them to hardwaredesign. We aim to raise the tooling level for a digital designto increase productivity. The workload demanded for theverification (testing) of digital systems is about double thetime of developing them in the first place.Using the power and flexibility of Chisel Blackboxes, ourtool can be used to verify designs implemented in any of themajor hardware description languages (i.e., VHDL or Verilog)with little overhead. Furthermore, golden models described in a r X i v : . [ c s . P L ] F e b ny programming language can be used using the Java JNI.Our work builds upon existing open-source projects and is thusalso open-source.We developed an object-oriented and functional frameworkfor verification in Scala. This framework is inspired by UVM,but leverages Scala’s conciseness with the combination ofobject-oriented programming with functional programming.An initial experiment of testing the accumulator circuit of theLeros processor [16] showed the that a test written with UVMwas about 800 lines of code, where a Scala based test wasaround 80 lines of code [15]. However, UVM supports morefunctionality that a plain ChiselTest in Scala.Within our verification framework, we support mixed lan-guage verification. Verilog can easily be combined with Chisel,as Chisel generates Verilog, and we use ChiselTest as a driverfor the open-source Verilog simulator Verlator. With Yosyssynthesis suite [18] and GHDL [10] we can translate VHDLinto Verilog.A verification method is only usable when it can handlemixed-source designs. This means a Scala driven method mustbe able to test components written in Verilog, VHDL, andSystemVerilog.Chisel has support for black boxes, which allows the useof Verilog code within the Chisel design. Therefore, it isrelatively easy to integrate Verilog components when wrappedinto a black box. However, this forces Chisel to use Verilatorinstead of Treadle to run the simulation, impacting startuptime.Chisel does not fully support VHDL. It can support VHDLusing VCS, but there is no open-source solution available forVHDL simulation. For companies with a lot of source codewritten in VHDL this is a concern, as they must be able tointegrate their existing IP in a Scala/Chisel based design andverification workflow. All major commercial simulation andsynthesis tools support mixed-language designs, but no open-source tools exist that provide the same functionality.To alleviate this issue, we use the open-source Yosyssynthesis suite [18]. Yosys is an open-source digital hardwaresynthesis suite for Verilog. Yosys also has a variety of plugins,one of these being a plugin for using GHDL [10], an open-source VHDL simulator. By using Yosys in conjunction withGHDL, VHDL files are compiled to an RTL-based intermedi-ate representation, which is then written to a Verilog file usingYosys. GHDL has full support for IEEE 1076 VHDL 1987,1993, 2002, and a subset of 2008. A working solution namedVHDL2Verilog has been made for this, which has been testedwith certain simple VHDL designs [2].This paper is an extension of [15].In the following sections, we will explore the differentbackgrounds on which ChiselVerify was based.II. B ACKGROUND AND S TATE - OF - THE -A RT VHDL and Verilog are the classic hardware descriptionlanguages, first appeared in the 1980s. SystemVerilog [1], asan extension to Verilog, adds features from VHDL for the hardware description and object-oriented features for verifi-cation. Recent advances with SystemVerilog and Chisel [4],[14] have brought object-oriented programming into the digitaldesign and verification process.Chisel is a “Hardware Construction Language” embeddedin Scala, to describe digital circuits [4]. Scala/Chisel bringsobject-oriented and functional programming into the world ofdigital design. For hardware generation and testing, the fullScala language and Scala and Java libraries are available. AsScala and Java’s full power is available to the verificationengineer, the verification process is also made more efficient.Chisel is a hardware construction language embedded inScala. Chisel allows the user to write hardware generatorsin Scala, an object-oriented and functional language. Forexample, we read in the string based schedules for a network-on-chip [17] and convert them with a few lines of Scala codeinto a hardware table to drive the multiplexer of the router andthe network interface.Chisel is solely a hardware construction language, and thusall valid Chisel code maps to synthesizable hardware. Byseparating the hardware construction and hardware verificationlanguages, it becomes impossible to write non-synthesizablehardware and in turn, speeds up the design process.Chisel and Scala are executing on the Java virtual machineand therefore have a very good interoperability with Java.Therefore, we can leverage a large pool of Java librariesfor hardware design and verification. Furthermore, the namespace of packets in Scala/Java simplifies integration of externalcomponents. Open source hardware components in Chisel canbe organized like software libraries at Maven servers.SystemVerilog adds object-oriented concepts for the non-synthesizable verification code. The SystemVerilog directprogramming interface [8] allows the programmer to callC functions inside a SystemVerilog (UVM) testbench. Thisenables co-simulation with a “golden model” written in C,and the testbench verifying the device under test (DUT). WithChiselTest we can co-simulate with Java and Scala models anduse the Java Native Interface to co-simulate with models in C.The Universal Verification Method (UVM) is an open sourcecollection of SystemVerilog, which is becoming popular inindustry. SystemVerilog has become a complex language withmore than 250 keywords, and it is unclear which toolssupport which language constructs for hardware description.In contrast with Chisel, when the program compiles, it issynthesizable hardware. Chisel is a small language, wherethe cheat sheet fits on two pages. The power of Chiselcomes from the embedding in Scala. Furthermore, as classichardware description languages are niche products, not manytools or libraries are available. With Chisel on Scala we havethe choice of different integrated development environments(IDE), testing infrastructure (e.g., ScalaTest), and many freelibraries.The Java JNI (Java Native Interface) allows for a simi-lar functionality in Java programs, allowing them to call Cfunctions and use their functionality. By using Scala, whichis built on Java, it is our hope to use the JNI together withcala’s test frameworks. The aim is to develop a framework forco-simulation with Scala/Chisel testers and a C-based goldenmodel. This should allow companies to keep their existing Cmodels, but move their simulation workflow into Scala/Chiseltesters.The digital design described in Chisel can be tested andverified with ChiselTest [12], a non-synthesizable testingframework for Chisel. ChiselTest emphasizes usability andsimplicity while providing ways to scale up complexity. Fun-damentally, ChiselTest is a Scala library that provides accessinto the simulator through operations like poke (write valueinto circuit), peek (read value from circuit, into the testframework), and step (advance time). As such, tests writtenin ChiselTest are just Scala programs, imperative code thatruns one line after the next. This structure uses the latest pro-gramming language developments that have been implementedinto Scala and provides a clean and concise interface, unlikeapproaches that attempt to reinvent the wheel like UVM.Furthermore, ChiselTest tries to enable testing best practicesfrom software engineering. Its lightweight syntax encourageswriting targeted unit tests by making small tests easy. A clearand clean test code also enables the test-as-documentationpattern, demonstrating a module’s behavior from a temporalperspective.III. C
ONSTRAINT R ANDOM V ERIFICATION
The complexity of digital design is growing with the ca-pacity of the silicon. A decade ago, the industry started tomove away from “direct” testing towards functional coverageand formal methods. One of the pillars of functional veri-fication is constraint programming. Constraint programming(CP) is a programming paradigm that has been developedsince the mid-1980s and emerged as a further developmentof logic programming. Constraint-based programming allowsconstraints and their solution mechanisms to be integrated intoa programming language. With constraint programming, theuser describes the problem in a declarative way, while thesolution process takes a back seat from the user’s perspective.A subset of these problems is the so-called Constraint Sat-isfaction Problems (CSP), which are mathematical problemsdefined as a set of objects such that their state must satisfyseveral constraints. CSP represents the entities of a problemas a finite homogeneous collection of constraints. typedef enum {UNICAST=11,MULTICAST,BROADCAST}pkt_type; class frame_t; rand pkt_type ptype; rand integer len; rand bit [7:0] payload []; constraint common { payload.size() == len; } // Constraint the members constraint unicast { len <= 2; ptype == UNICAST; } // Constraint the members constraint multicast { len >= 3; len <= 4; ptype == MULTICAST; } endclass Listing 1. Random object in SystemVerilog
Listing 1 shows a class named ”frame t”. It uses the ”rand”keyword for variables ”len”, ”ptype”, and payload. There-fore these are the variables that can be randomized. Thenconstraints to these variables are applied and declared bythe “common” ”unicast,” and ”multicast” constraint groups.Each class in SystemVerilog has an intrinsic method called”randomize(),” which causes new values to be selected for allthe variables declared with the rand keyword. The selectedvalue for each variable will respect the constraints appliedto it. If there are “rand” variables that are unconstrained, arandom value inside their domain will be assigned. Combiningrandom classes using the inheritance OOP paradigm allows thecreation of general-purpose models that can be constrained toperform domain-specific functions. In the research process, aCSP solver was implemented in Scala based on the methoddescribed in [13]. The implementation is composed of twomain components. The first one is the CSP solver itself,which uses a combination of backtracking and arc consistencyto generate solutions for well-defined problems. The secondcomponent is a small DSL, which allows users to declare andrandomize objects. object pktType extends SVEnumeration { val UNICAST: Value = Value(11) val MULTICAST: Value = Value(0) val BROADCAST: Value = Value(1) val domainValues = { values.map(x => x.id).toList } } class Frame extends
Random { import pktType._ var pType: RandInt = rand(pType, pktType.domainValues()) var len: RandInt = rand(len, 0 to 10 toList) var noRepeat: RandCInt = randc( noRepeat, 0 to 1 toList) var payload: RandInt = rand(payload, 0 to 7 toList) val common = constraintBlock ( binary ((len, payload) => len == payload) ) val unicast = constraintBlock( unary (len => len < = unary (pType => pType == UNICAST.id) ) val multicast = constraintBlock( unary (len => len > = unary (len => len < = unary (pType => pType == MULTICAST.id) ) } Listing 2. Random object in Scala
Listing 2, shows an example of random a object. Contraryto SystemVerilog, to declare a random object, the user has toextend the class from the Random base-class provided by thelibrary. After that each random variable has to be declared oftype ”RandInt” and initialized with the ”rand” macro. Finally,like for SystemVerilog, in Scala inheriting the Random baseclass exposes the method ”random” which assigns randomvalues to the random fields of the class.IV. V
ERIFICATION OF
AXI4 I
NTERFACED C OMPONENTS
Another solution to the ever-increasing complexity of digitaldesigns is to use standardized interfaces, which enable greaterreuse. One of such standard interfaces is AXI4, an openstandard by ARM [3], which is used in particular to connectprocessor nodes to memories. As such, most available syn-thesis tools, including Xilinx’ Vivado, provide IP generatorswhose output IP blocks are equipped with AXI interfacesalong with optional verification structures written in (System-)Verilog [19].Typically, verification of components with such standardinterfaces is provided through so-called bus functional models (BFMs) that abstract complex low-level signal transitionsbetween bus masters and slaves to a transaction level (e.g.,write and read transactions). Unfortunately, such BFMs are notyet available in Chisel – hence, why we include an exampleBFM based around ChiselTest in our framework.
A. Introduction to AXI4
The Advanced eXtensible Interface protocol by ARM isa highly flexible interconnect standard based around fiveseparate channels; three for write operations and two forread operations. Operations, known as transactions, consistof a number of transfers across either set of channels. Allchannels share a common clock and active-low reset andbase their transfers on classic ready-valid handshaking. Itis designed with DMA-based memories in focus supportingmultiple outstanding transactions and out-of-order completion.The five channels are: • Write Address for transferring transaction attributes frommaster to slave • Write Data for transferring write data and strobe frommaster to slave • Write Response for transferring transaction status of awrites from slave to master • Read Address same as
Write Address , but for reads • Read Data for transferring read data from slave to masterConsider for example a write transaction of 16 data ele-ments. First, the master provides transaction attributes (e.g.,target address, burst length, and data size) as a single transferover the
Write Address channel, then the master transfers the 16 data elements one at a time over the
Write Data channel,and finally, the slave indicates the status of the transaction overthe
Write Response channel. The
Read Address and
Read Data channels may operate independently at the same time. A fulldescription is available in [3].
B. Implementation
Our implementation includes bundles definining the fivedifferent channels, abstract classes representing both masterand slave entities, transaction-related classes, and of coursethe BFM itself; the
FunctionalMaster class. The BFM isparameterized with a DUT that extends the slave class andprovides a simple, transaction level interface to control theDUT. As such, its two most important public methods are createWriteTrx and createReadTrx which do exactlyas their names indicate; create and enqueue write and readtransactions.Internally, the BFM makes use of ChiselTest’s multithread-ing features to allow for (a) non-blocking calls to the afore-mentioned methods (i.e., one can enqueue multiple transac-tions without waiting for their completion) and (b) emulatingthe channel independence more closely. As such, when, forexample, a write transaction is enqueued and no other writetransactions are in-flight, the BFM spawns three new threads,one for each required channel. The threads each handle thehandshaking necessary to operate the channels.
C. A Simple Example
Returning to the example used before, using the BFM to testa module called
Memory is as simple as shown below. Creatinga write transaction with 16 data elements (minimum burstlength is 1, hence len = 15 means a burst of 16 items) takesjust one call to a method the majority of whose argumentshave default values. It is equally simple to create a subsequentread transaction – but beware that due to the BFM’s parallelexecution style, the channels are indeed independent. As such,not waiting for a write to complete before starting to read fromthe same address may return incorrect results depending on theimplementation of the DUT. class MemoryTester extends
FlatSpec with
ChiselScalatestTester with
Matchers { behavior of "My(cid:32)Memory(cid:32)module" it should "write(cid:32)and(cid:32)read" in { test( new Memory()) { dut => val bfm = new FunctionalMaster(dut) master.createWriteTrx(0,Seq.fill(16)(0x7FFFFFFF), len = = master.createReadTrx(0, len =
15, size = } } } Listing 3. Using the AXI4 BFM with ChiselTest . C
OVERAGE IN C HISEL
One of the main tools used in verification is test coverage.This allows verification engineers to measure their progressthroughout the testing process and have an idea of howeffective their tests actually are. Coverage can be separatedinto two distinct categories: code coverage and functionalcoverage. code coverage defines a quantitative measure of thetesting progress, ”How many lines of code have been tested?” ,whereas functional coverage gives a rather qualitative measure, ”How many functionalities have we tested?” . Our solutiongives the verification engineer access to two ways of obtainingtheir code coverage and new constructs allowing the definitionof a verification plan and the creation of a functional coveragereport directly integrated into the Chisel testing framework. a) Code Coverage with Treadle:
The first part of oursolution is about code coverage, more specifically line cover-age that was added to the Treadle FIRRTL execution engine.Treadle is a common FIRRTL execution engine used tosimulate designs implemented in Chisel. This engine runs onthe FIRRTL intermediate representation code generated bya given Chisel implementation and allows one to run user-defined tests on the design using frameworks like iotesters or the more recent testers2 . In our pursuit of creating averification framework, we found that one way to obtain linecoverage would be to have our framework run on an extendedversion of Treadle that was capable of keeping track of saidinformation.The solution that was used to implement line coverage wasbased off of a method presented by Ira. D. Baxter [5]. The ideais to add additional outputs for each multiplexer in the design.These new ports, which we will call
Coverage Validators , areset depending on the paths taken by each multiplexer andthat information is then gathered at the end of each test andmaintained throughout a test suite. Once the testing is done,we used the outputs gathered from the
Coverage Validators to check wether or not a certain multiplexer path was takenduring the test, all of this resulting in a branch coveragepercentage.This was implemented in Treadle by creating a custompass of the FIRRTL compiler that traverses the AbstractSyntax Tree (
AST ) and adds the wanted outputs and coverageexpressions into the source tree. Once that is done, the
TreadleTester samples those additional outputs every timethe expect method is called and keeps track of their valuesthroughout a test suite. Finally it generates a Scala caseclass containing the following coverage information: • The multiplexer path coverage percentage. • The coverage Validator lines that were covered by a test. • The modified LoFIRRTL source code in the form of a
List[String] .The
CoverageReport case class can then be serialized, givingthe following report:
COVERAGE: 50.0% of multiplexer paths testedCOVERAGE REPORT: + circuit Test_1 :+ module Test_1 :+ input io_a : UInt<1>+ input io_b_0 : UInt<2>+ input io_b_1 : UInt<2>+ input clock : Clock+ output io_cov_valid_0 : UInt<1>+ output io_cov_valid_1 : UInt<1>+ output out : UInt<2>++ io_cov_valid_0 <= io_a- io_cov_valid_1 <= not(io_a)+ out <= mux(io_a, io_b_0, io_b_1)
The example above is taken for a simple test, where we areonly testing the path where in a is 1. This means that, sincewe only have a single multiplexer, only half of our brancheshave been tested and we would thus want to add a test for thecase where in a is 0. The report can thus be interpreted asfollows: • ” + ” before a line, means that it was executed in at leastone of the tests in the test suite. • ” - ” before a line, means that it wasn’t executed in anyof the tests in the test suite.Treadle thus allows us to obtain coverage at the FIRRTLlevel. A more interesting result would be if the FIRRTL linecoverage would be mapped to the original Chisel source. Thisis possible but challenging, since Treadle only has access tothe original source code through Source locators which mapsome of the FIRRTL lines back to Chisel. This means thatthe code can only be partially mapped and the remainder willhave to be reconstructed using some smart guessing. b) Functional Coverage Directly in Scala:
FunctionalCoverage is on the principal tools used during the verificationprocess, since it allows one to have a measurement of ”howmuch of the specification has been implemented correctly” . Averification framework would thus not be complete withoutconstructs allowing one to define a verification plan andretrieve a functional coverage report. The main language usedfor functional coverage is
SystemVerilog , which is why oursolution is based on the same syntax. There are three maincomponents to defining a verification plan: • Bin : Defines a range of values that should be tested for(i.e. what values can we expect to get from a given port). • CoverPoint : Defines a port that needs to be sampled inthe coverage report. These are defined using a set of bins. • CoverGroup : Defines a set of
CoverPoint s that need tobe sampled at the same time.Using the above elements, one can define what’s known as averification plan, which tells the coverage reporter what portsneed to be sampled in order to generate a report. In order toimplement said elements in Scala we needed to be able to dothe following:
Define a verification plan (using constructs similar to coverpoint and bins ). • Sample DUT ports (for example by hooking into the
Chisel Testers2 framework). • Keep track of bins to sampled value matches (using a sortof DataBase). • Compile all of the results into a comprehensible CoverageReport.Implementing these elements was done using a structurewhere we had a top-level element, known as our
CoverageReporter which allows the verification engineer to definea verification plan using the register method, which it-self stores the coverpoint to bin mappings inside of our CoverageDB . Once the verification plan is defined, we cansample our ports using the sample method, which is doneby hooking into
Chisel Testers2 in order to use its peekingcapabilities. At the end of the test suite a functional coveragereport can be generated using the printReport method,which shows us how many of the possible values, definedby our bin ranges, were obtained during the simulation. val cr = new CoverageReporter cr.register( //Declare CoverPoints //CoverPoint 1 CoverPoint(dut.io.accu , "accu", Bins("lo10", 0 to 10):: Bins("First100", 0 to 100) ::Nil):: //CoverPoint 2 CoverPoint(dut.io.test, "test", Bins("testLo10", 0 to 10) ::Nil):: Nil, //Declare cross points Cross("accuAndTest", "accu", "test", CrossBin("both1", 1 to 1, 1 to 1) ::Nil):: Nil)
The above code snippet is an example of how to define averification plan using our coverage framework. The conceptsare directly taken from
SystemVerilog , so it should be acces-sible to anyone coming from there. One concept, that is usedin the example verification plan, which we haven’t presentedyet is the idea of
Cross Coverage defined using the
Cross construct.
Cross Coverage allows one to specify coveragerelations between CoverPoints. This means that a cross definedbetween, let’s say, coverpoint a and coverpoint b will beused to gather information about when a and b had certainvalues simultaneously. Thus in example verification plan weare checking that accu and test take the value 1 at the sametime.Once our verification plan is defined, we need to decide whenwe want to sample our cover points. This means that at somepoint in our test, we have to tell our CoverageReporter tosample the values of all of the points defined in our verificationplan. This can be done, in our example, simply by calling cr.sample() when we are ready to sample our points. Finallyonce our tests are done, we can ask for a coverage report by calling cr.printReport() which results in the followingcoverage report: ============== COVERAGE REPORT ============================== GROUP ID: 1 ================COVER_POINT PORT NAME: accuBIN lo10 COVERING Range 0 to 10 HAS 8 HIT(S)BIN First100 COVERING Range 0 to 100 HAS 9 HIT(S)============================================COVER_POINT PORT NAME: testBIN testLo10 COVERING Range 0 to 10 HAS 8 HIT(S)============================================CROSS_POINT accuAndTest FOR POINTS accu AND testBIN both1 COVERING Range 1 to 1 CROSS Range 1 to 1HAS 1 HIT(S)============================================
An other option would be, for example if we want to doautomated constraint modifications depending on the currentcoverage, to generate the coverage as a Scala case class and then to use it’s binNcases method to get numerical andreusable coverage results.One final element that our framework offers is the pos-sibility to gater delayed coverage relationships between twocoverage points. The idea is similar to how a
Cross works,but this time rather than sampling both points in the samecycle, we rather look at the relation between one point at thestarting cycle and an other point sampled a given number ofcycles later. This number of cycles is called the delay andthere are currently three different ways to specify it: • Exactly delay, means that a hit will only be consideredif the second point is sampled in its range a given numberof cycles after the first point was. • Eventually delay, means that a hit will be consideredif the second point is sampled in its range at any pointwithin the following given number of cycles after the firstpoint was. • Always delay, means that a hit will be considered if thesecond point is sampled in its range during every cyclefor a given number of cycles after the first point wassampled.VI. U SE C ASE : H
ARDWARE SORTING
In the process of the research a use case provided byMicrochip was implemented in order to apply developedtesting and verification features. In the following section theimplementation of the use case and the connected testing willbe discussed. The code can be found in the project repository.
A. Specification
The provided specification document describes a priorityqueue which can be used in real time systems for the schedul-ing of deadlines by providing information about the next timerexpiration to the host system. Sorting of the enqueued elementsis conducted by applying the heap sort algorithm. Elements arestructured in a so-called heap which is a tree data structure.The tree needs to be balanced in order for the timer closesto expiring to get to the top of the tree and thus to the headof the queue. This means verifying that every parent node issmaller than the connected child nodes.The log k N deepness of the tree provides good scalabilityin terms of insertion and removal times when the queue sizeincreases, since worst case log k N − k is thenumber of child elements per parent and N is the numberof elements in the heap. A trade-off is the varying delayconnected to the rebalancing of the tree where the queue isunresponsive. If queuing happens in bursts, a buffer could beadded. Here the introduced delay from insertion request toactual appearance of the enqueued value in the heap of courseneeds to be taken into account.In order for the host system to have the ability to distinguishbetween multiple consecutive super cycles and clock cyclesin a super cycle, the values inserted into the queue are splitinto the fields cyclic and normal priority (time out value). Theremoval functionality of the queue requires a reference system.A reference ID is therefore given together with the element atinsertion, where ID generation and uniqueness are handled bythe host system. B. Implementation
The implemented priority queue is described in chisel. Itis split into 3 modules: The
Heapifier , responsible for thesorting, the
QueueControl , taking care of the general controlflow in the queue and the
Memory module which handlesmemory accesses and can search the memory for a specificreference ID.In order for the priority queue to work efficiently it is crucialto optimize memory accesses. Therefore a layout is proposedin the specification where all child elements of a certain nodeare stored together under one memory address. This allowssingle memory access fetches of all k children. Since the rootnode has no siblings it is stored alone in a register. This enableseven faster access in certain scenarios which are discussed lateron.One memory row contains k elements each consisting ofthe 3 introduced fields: cyclic priority, normal priority and theconnected reference ID. In the implemented Memory modulea single sequential memory is instantiated where masking isused to be able to over-write specific elements in one memoryrow.There are a variety of solutions to the problem of contentaddressability which is required here in order to find positionsof elements in the heap by providing their reference ID.Cache-like memory relying on parallel tag comparison couldbe used to achieve fast and constant search times. On theother hand, the existing sequential memory could be searchedlinearly, where k reference ID’s are compared in parallel untilthe searched reference ID is found. A compromise betweenthe two solutions could include splitting memory space overmultiple instances of the latter and thus reducing worst casesearch time. The priority queue is designed to be independent "Warm Up" phase 1for heapify up "Warm Up" phase 2for heapify upread next parent swap required?-> write back childrenelse go idle write back parentreached end?-> go idleelse continue"Warm Up" phase 1for heapify down "Warm Up" phase 2for heapify downread next children swap required?-> write back parentelse go idle write back childrenreached end?-> go idleelse continueidle Fig. 1. The state diagram of the
Heapifier idle wait for heapify upwait for heapifydownwait for searchinit search andcheck if tail or headmatch reset tail cell overwrite removedelement with tailremove headremove tail remove the lastelement in thequeue insert the firstelement in thequeue append new elementto end of queue
Fig. 2. The state diagram of the
QueueControl of the specific implementation. As a reference, the linearsearch is implemented.The
Heapifier loops from a given starting point in the treeeither upwards or downwards until it hits the root or a childlessnode. In each iteration it is checked whether the parent elementis smaller than its child elements and if not a swap occurs.Once the parent element and child elements of the startingpoint are fetched from memory, only the next parent or blockof children respectively needs to be fetched depending on thelooping direction (up/down). Thus only 3 cycles (1 fetch, 2write backs) are required per swap. The state diagram of the
Heapifier is shown in Figure 1.The task of the
QueueControl is to insert or removeelements and then signalize to the
Heapfier to balance thetree. As it can be seen in Figure 2, there are a series of specialcases where insertion or removal times can be reduced forinstance by taking advantage of the head element being savedin a register. The achieved best and worst case insertion aswell as removal times are presented in Table I. ead insertion 2 cyclesNormal insertion min 7 cycles and max 5 + · log k ( N ) Head removal min 8 cycles and max 6 + · log k ( N ) Tail removal 3 cyclesNormal removal min 12 cycles + search timemax 13 + · log k ( N ) + search time ( N = queued elements, k = number of child elements per node) TABLE IB
EST AND WORST CASE INSERTION AND REMOVAL TIMES
Size k = k = k = k = IMULATED INSERTION / REMOVAL TIMES . C. Testing and Verification
All modules and the fully assembled queue were testedwith random stimuli in scala by employing the ChiselTestframework. In order to check whether the dut matched thespecification, reference models were written for each module.Most modules could be modelled by a single or multiplefunctions. As a reference model for the whole priority queue,a class was written which simulates state and interaction on aoperation based level. In order to abstract interaction with theDUT, wrapper classes were employed. These make it easy tothink on a transaction or operation based level when writingtests.In the test of the priority queue, purely random pokes areproduced. In order to evaluate how well these pokes werespread over the spectrum of possible or interesting input com-binations, the developed functional coverage feature is used.This allows to evaluate whether interesting or important edgecases are reached by the random input sequence. Furthermore,metrics on how many of the generated pokes actually are validoperations, are collected. The average insertion and removaltimes measured under a random test run are shown in tableVI-B. These numbers are of course heavily dependent onaccess patterns and are as such only representative for thecompletely random test case used here. The tests can be foundin the project repository.VII. UVM, S
YSTEM V ERILOG AND V ERIFICATION
This section serves as a reference on what the UVM can do,how it works, and how some of this (namely the SystemVerilogDirect Programming Interface (DPI)) may be implemented ina Scala based testing framework.The Universal Verification Methodology (UVM) is a veri-fication framework built on top of the testing capabilities ofSystemVerilog. Prior to the introduction of the UVM, the ma-jor EDA simulation vendors supported different frameworks/ methodologies. This meant that a verification environmentdesigned for one simulator might not run under another sim-ulator. The main purpose of the UVM was to standardize thetesting framework used across EDA vendors, making it easier for users to use their testbenches across different softwaresuites.The testbench in the UVM is built up around a numberof components. Each component performs only one taskin the testbench, allowing the engineer to make changesto some components without affecting others. For example,the sequencer is the component responsible for generatingtransactions for the DUT, whereas the driver is responsiblefor converting the transaction into pin-level wiggles, i.e.,generating correct start/stop conditions and driving signals. Ifa new sequence of transactions is to be generated, only thesequencer is affected. Likewise, the sequencer does not carehow the transactions are converted into pin-level signals—this is the sole responsibility of the driver. This separationinto several components results in a more structured testbenchdesign as there are fewer dependencies than in a monolithictestbench.The main components of a UVM testbench are as follows:A
Sequence(r): defines the order of transactions necessaryfor a given purpose, e.g., synchronization or reset sequence.The sequencer is responsible for transferring the transactions,defined by the sequence, to the driver.A
Driver converts transactions into pin-level signals anddrives these signals onto the DUT.An
Interface is a SystemVerilog construct which allowsthe user to group related signals. A DUT may have severalinterfaces attached. The interface is used to avoid hookingdirectly into the DUT, making it easier to test multiple DUTversions.A
Monitor monitors all traffic on the interface, convertingpin-level signals into transaction-level objects that can beoperated on by other components, such as a coverage collectoror scoreboard.An
Agent encapsulates monitor, sequencer and driver, set-ting configuration values. Agents may be set active or passive(with or without a driver and sequencer). An agent is usefulwhen it is necessary to have multiple instances of the samecomponents, e.g., when a 4-port network switch needs fouridentical drivers with different configurations.A
Scoreboard is used to check whether correct functionalityis achieved. Usually does so by using a “golden model”for co-simulation via the SystemVerilog direct programmingInterface.The
Environment is used to configure and instantiate allchild components. Environments are typically application-specific and may be modified by the test.The
Test is the top-level verification component. The testdesigner may choose to perform factory overrides of classesand set configuration values here, which modify the childcomponents.As shown above, even a “Hello, World” example using theUVM requires that the user understands how and why each ofthe different UVM components should be used. The use of somany components causes UVM to have a very steep learningcurve, which may discourage adoption. This also means thatUVM is not the proper testing methodology for small designsr one-off tests due to the initial workload. However, once theinitial setup of the testbench is finished for large and complexdesigns, generating new tests becomes easier.
A. Scoreboard and DPI
The purpose of the scoreboard is to ensure that the DUTmatches specification. When using directed tests (i.e. hand-written tests meant to test a single part of the specification),this may be as simple as constructing a list of input/outputvalues and checking these in order. When using randomizedtesting, the scoreboard is usually implemented as a softwaremodel (sometimes called a golden model) which is definedto be correct. The software model should exactly mimic thefunctionality of the DUT, and thus provide the same outputsgiven the same inputs.The scoreboard may be implemented purely in SystemVer-ilog, or it may be implemented in a C-like language (C, C++ orSystemC). One of the benefits that SystemVerilog adds to theverification environment is the ability to interface with C-codethrough the use of the SystemVerilog Direct ProgrammingInterface (DPI). This is an interface which allows C code to becalled from inside the SystemVerilog testbench, and likewiseallows for SystemVerilog code to be called from inside of a Cprogram. Programming low-level simulations of hardware ismost likely easier in C than in plain SystemVerilog. In listing4 is shown a simple example of some SystemVerilog codecalling C code through the SystemVerilog DPI.Notice the inclusion of the header file svdpi.h , whichcontains the definitions necessary for interoperability. Oncethis is done, the function must be imported in SystemVerilogby use of the import "DPI-C" statement, after which thefunction may be called as any other SystemVerilog function.Using the DPI is surprisingly simple and painless, and makesit very simple to integrate a C model in the testbench.It should be noted that a scoreboard doesn’t necessarily”rate” the performance of the DUT by comparing it to othermodules, as one might expect from the name. The DUT isonly compared against the reference model, and the ”rating”is how many assertions pass/fail in a given test.
B. Constrained Random Verification
Most UVM testbenches employ ”Constrained Random Ver-ification” (CRV) for generating test stimulus. This is asopposed to using directed testing, where input vectors arepredefined to test a certain behaviour. With CRV, random teststimuli are generated. The ”Constrained” part of CRV impliesthat these values aren’t entirely random, but are chosen tofulfill a series of criteria. If eg. the DUT is an ALU witha 5-bit opcode field of which only 20 bit-patterns are used,it would be prudent to only generate the 20 valid opcodesfor most test purposes. Specific tests could then be written toensure proper operation if an invalid opcode was asserted.An example of a class with randomized variables is seen inlisting 5.The keyword constraint is used to constrain a variabledefined with the rand keyword. One randomized variable, // hello.c "svdpi.h"
C. Coverage collection
The purpose of coverage collection, in this case functionalcoverage collection, is to check whether all ”necessary” (asdefined by the verification engineer) input combinations havebeen generated. For the ALU mentioned above, it might beinteresting to check whether all opcodes work correctly whenasserted after a reset, and whether over/underflow flags arecorrectly set when performing arithmetic operations. In gen-eral, functional coverage is concerned with evaluating whetherall functionality of a device has been tested. This is opposedto line coverage, which evaluates which lines of code wererun during a simulation.An example of functional coverage collection is seen inlisting 7 where the randomized values from before are covered.A covergroup is a label used to collect relevant values underthe same heading. A coverpoint is a directive instructingthe simulator that the given value should be monitored forchanges. In the declaration of the coverpoint, several bins aregenerated. These bins correspond to the bins of a histogram.Every time an event occurs which matches the declarationinside a bin, the counter associated with that bin is incrementedby one. An event may cause multiple bins to increment at thesame time.In the coverage declarations shown in listing 7, the cover-group cg bcd only covers one field, bcd . The bin is labeledby prepepending
BCD: in front of the coverpoint directive.10 bins are generated which each sample the values 0-9, andthe remaining values 10-15 are sampled into the default bin others .In the covergroup cg others three coverpoints are setup. The
VAL coverpoint samples 3 ranges of values. Anyvalue in the range [0:20] will cause the counter on bin low to increment by one. Likewise for the other bins inthat coverpoint. The A coverpoint auto-generates one bin foreach possible value it may take on, 16 bins in total, sinceno bins are explicitly declared. The coverpoint OP has onebin, toggle which only increments when mode toggles from to . Finally, the cross statement implements cross //Cover.svh class Cover; Myclass mc; covergroup cg_bcd; BCD: coverpoint mc.bcd { bins valid[] = {[0:9]}; bins others = default ; } endgroup: cg_bcd covergroup cg_others; VAL: coverpoint mc.value { bins low = {[0:20]}; bins high = {[235:255]}; bins bad = {[110:130]}; bins others = default ; } A: coverpoint mc.a; OP: coverpoint mc.op { bins toggle = (0 => 1); } cross A, OP; endgroup: cg_others function new(); cg_bcd = new; cg_others = new; endfunction: new function void sample(); cg_bcd.sample(); cg_others.sample(); endfunction: sample endclass: Cover Listing 7. Examle SystemVerilog code showing how covergrups andcoverpoints are organized. coverage. Cross coverage tracks what values were sampled atmultiple coverpoints at the same time.Using the cross of A and OP , it may be possible to have100% coverage on both coverpoints (ie. all bins have been hitat least once), but the cross coverage may not be at 100% ifeg. OP never toggled while A was 1. Increasing the number ofrandom samples that are generated may alleviate this problem.If it doesn’t, it may be indicative that something is wrong inthe structure of the testbench or DUT.In listing 8, the module from listing 6 has been expandedto also use the coverage collector.In Figure 3, the result of running the 20 iterations is seenfor coverpoints BCD, A, VAL . D. C Integration in Scala
The SystemVerilog DPI is a great addition to the verificationworkflow, as it allows the designer to easily implement a Cmodel into the SystemVerilog testbench. Similar functionalityis available in Scala by leveraging the Java Native Interface(JNI). This is an interface which allows Java / Scala programsto call native code, i.e. code compiled for the specific CPU itis running on. This is typically encountered as .DLL files onWindows or .so files on linux. //top.sv ‘include "Myclass.svh" ‘include "Cover.svh" module top; Myclass mc; Cover cov; initial begin; mc = new; cov = new; cov.mc = mc; for( int i=0; i<20; i++) begin mc.randomize(); cov.sample(); end end endmodule Listing 8. Showcasing how multiple random values are generated and sampledby the coverage collector.
Coverpoint BCD t h e r s Bins H i t s Coverpoint VAL low bad high others
Bins H i t s Coverpoint A
Bins H i t s Fig. 3. Coverage bins generated by running the code in listing 8
When using Chisel and other external libraries, it is recom-mended to use the Scala Build Tool (sbt) to manage the build.However, doing this makes it more difficult to use the defaultJNI integration in Scala. This can be alleviated by use of theplugin sbt-jni by Jakob Odersky.Listing 9 shows an example of a Scala file with a nativefunction.The annotation @native informs the Scala compiler thatthe function should be found in native code. The annotation @nativeLoader is necessary for use with the plugin. Thename ”native0” is the name of the current SBT project,appended with a 0.Once the above file has been written, the corresponding Cheader file is generated by running sbt javah . The contents import ch.jodersky.jni.nativeLoader nativeLoader("native0") //use NativeLoader tosimplify code-reuse in other placesclassMyclass // --- Nativemethodsnative def digitsum(num: Int): Int native def hello(string: String): StringobjectMyclass // Main method to test our nativelibrarydef main(args: Array[String]): Unit =val mc = new MyClassval sum =mc.digitsum(1234)val string =mc.hello("Scala")println(s" string ” ) println ( s ” Digitsumis sum")//Outputs://Hellofrom C, Scala//Digit sum is 10
Listing 9. Example Scala code showing how to integrate native code in Scala. "MyClass.h"
JNIEnv structure. Once thisis done, a CMake makefile is generated by running sbt"nativeInit cmake" , and the C files are compiled using sbt nativeCompile . If all goes well, the code can then berun with sbt run .If changes are made to any of the function definitions orScala file, no other steps are necessary than running sbtrun again. This will also invoke the CMake script generationand compilation steps if necessary. If new native methodsare added to the Scala file, sbt javah must be run againto generate new C headers.For more information regarding the plugin setup, see theplugin page on Github or the file HowToJni.md in the docu-mentation repository.III. C
ONCLUSION
In this paper, we introduced ChiselVerify, an open-sourcesolution that should increase a verification engineer’s produc-tivity by following the trend of moving towards a more high-level and software like ecosystem for hardware design. Withthis, we brought functional coverage, statement coverage, con-straint random verification and transactional modelling to theChisel/Scala ecosystem, thus allowing for the improvement ofcurrent engineer’s efficiency and easing the way for softwareengineers to join the hardware verification world.
Acknowledgment
This work has been performed as part of the “InfinIT– Innovationsnetværk for IT”, UFM case no. 1363-00036B,“High-Level Design and Verification of Digital Systems”.R
EFERENCES[1] IEEE standard for SystemVerilog – unified hardware design, specifica-tion, and verification language, 2018.[2] S. Andersen. Vhdl2verilog. https://github.com/chisel-uvm/vhdl2verilog.[3] ARM. Amba axi and ace protocol specification axi3, axi4, and axi4-liteace and ace-lite. https://developer.arm.com/documentation/ihi0022/e/.[4] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avizienis,J. Wawrzynek, and K. Asanovic. Chisel: constructing hardware in a scalaembedded language. In
The 49th Annual Design Automation Conference(DAC 2012)
Commun. ACM
Proceedingsof the 38th Annual International Symposium on Computer Architecture ,ISCA ’11, pages 365–376, New York, NY, USA, 2011. Association forComputing Machinery.[10] T. Gingold. Ghdl. https://github.com/ghdl/ghdl.[11] J. L. Hennessy and D. A. Patterson. A new golden age for computerarchitecture.
Commun. ACM , 62(2):48–60, Jan. 2019.[12] R. Lin. ChiselTest. https://github.com/ucb-bar/chisel-testers2.[13] S. Russell and P. Norvig.
Artificial intelligence: a modern approach .Prentice Hall, 2002.[14] M. Schoeberl.
Digital Design with Chisel . Kindle Direct Publishing,2019. available at https://github.com/schoeberl/chisel-book.[15] M. Schoeberl, S. T. Andersen, K. J. H. Rasmussen, and R. Lin. Towardsan open-source verification method with chisel and scala. In
Proceedingsof the Third Workshop on Open-Source EDA Technology (WOSET) ,2020.[16] M. Schoeberl and M. Petersen. Leros: The return of the accumulatormachine. In M. Schoeberl, T. Pionteck, S. Uhrig, J. Brehm, andC. Hochberger, editors,
Architecture of Computing Systems - ARCS 2019- 32nd International Conference, Proceedings , pages 115–127. Springer,May 2019.[17] M. Schoeberl, L. Pezzarossa, and J. Sparsø. S4noc: a minimalisticnetwork-on-chip for real-time multicores. In