AA RISC-V SystemC-TLM simulator
Marius Monton [email protected] de microelectrònica i sistemes electrònicsUniversitat Autònoma de BarcelonaBarcelona, Spain
ABSTRACT
This work presents a SystemC-TLM based simulator for a RISC-Vmicrocontroller. This simulator is focused on simplicity and easyexpandable of a RISC-V. It is built around a full RISC-V instructionset simulator that supports full RISC-V ISA and extensions M, A, C,Zicsr and Zifencei.The ISS is encapsulated in a TLM-2 wrapper that enables itto communicate with any other TLM-2 compatible module. Thesimulator also includes a very basic set of peripherals to enable acomplete SoC simulator. The running code can be compiled withstandard tools and using standard C libraries without modifications.The simulator is able to correctly execute the riscv-compliancesuite. The entire simulator is published as a docker image to ease itsinstallation and use by developers. A porting of FreeRTOSv10.2.1for the simulated SoC is also published.
CCS CONCEPTS • Computer systems organization → Embedded hardware ; High-level language architectures ; •
Hardware → Simulationand emulation . KEYWORDS
RISC-V, SystemC, TLM-2.0, Simulation Infrastructure, ISS
Many simulators has been published since the release of first draftsof RISC-V ISA [8]. These simulators use different techniques andtechnologies to achieve different requirements: good performance,good visualization of the processor, architectural exploration, etc.Most of them conform to RISC-V ISA specifications; some of themuse a previous infrastructure and adapt the ISS to follow the RISC-VISA and re-uses some peripherals already simulated [3, 4, 7, 19].Others are written from scratch and includes the ISS and a mini-mum set of peripherals [6, 11]. There are FPGA-based simulatorsto increase performance and simulation speed [9] as well as theprecision of the simulation results.The Spike simulator is most common simulator and it is usedas reference model for RISC-V ISA [6]. Other simulators are in-tended for a graphical visualization for the entire execution of theinstructions inside the CPU [15].SystemC is a set of libraries for the C++ language to allow thedescription and simulation of hardware based systems by a event-driven simulation model. This libraries add time management, con-currency and hardware-like data types to C++ [1].Transaction Level Modelling adds a layer to SystemC in orderto model the interface between different modules in a lightweightway. This model technique uses transactions to abstract any kind of communication between modules, hiding or avoiding the details ofthe communication itself: a transaction is an access from a Master(called Initiator) to a Slave (called Target) to a memory addresswith a length and some attributes. The Slave will respond to thetransaction within a time (that can be 0 for a basic modeling) andthe writing or reading of the transaction. All other details of thetransaction (bus access, signals change, etc.) are not modeled. Inmore detailed modeling, the different phases of a bus access canbe specified. Currently, SystemC standard includes TLM modeling[1]. The modules can also interchange data using direct pointers tomemory instead to transactions to increase simulation speed. Thistechnique is named Direct Memory Interface (DMI).TLM has boosted the interoperability between vendors modelsand the appearance of many IPs that are interchangeable and fullycompatible among different systems and vendors. The fundamentalidea of this work is to introduce all these features to a RISC-Vsimulator.The source code of the entire project is open-source and pub-lished [14].The presented simulator is intended for an easy use and simpleto extend, with clear code and able to simulate an entire SoC, likeany embedded microcontroller in the market. To keep the codesimple, meta-programming has been avoided and C++ templatesuse is keep as low as possible.The paper is structured in the following sections: Section IIdepicts the architecture of the entire simulator, Section III showsoftware particularities and tool-chain modifications, Section IVshows simulation performance and compliance results. Section Vconcludes the paper.
One of the main goals of this simulator was to be easily extensibleand modifiable. To achieve this objective, the original design wasvery simple and clear, with the use of naive techniques and a sourcecode designed for simplicity.The simulator architecture includes a ISS for RV32I ISA [20], abus controller, the main memory and peripherals. Communicationbetween these modules is done by TLM-2 sockets (see Figure 1).
The ISS simulates a single hardware thread (HART) and includesprivileged instructions. It is divided in three modules:
Instruction , Execute and
Registers : • Instruction
Decodes instructions and checks for extensions.This module can access all fields of each instruction type (R,I, S, B, U and J type). a r X i v : . [ c s . A R ] O c t . Montón Figure 1: TLM Diagram of the entire simulator • Execute
Executes instructions, accessing registers and mem-ory and performing operations. This module also executes"MACZicsr_Zifencei" extensions [20]. • Registers
Implements the register file for the entire CPU, in-cluding general-purpose registers (r0-r31), Program counter(pc) and all necessary entries in Control and Status Registers(CSR) registers.This CPU is a minimal, fully functional model with a end-lessloop fetching and executing instructions without pipeline, branchpredictions or any other optimization technique. All instructionsare executed in one single cycle, but it can be easy customized toper instruction cycle count.The
Execute module implements each instruction with a classmethod that receives the instruction register. These methods per-form all necessary steps to execute the instruction. In case of abranch instruction, these methods are able to change the PC value.For Load/Store instructions, the methods are in charge to accessthe required memory address.The CPU is designed following Harvard architecture, hence theISS has separate TLM sockets to interface with external modules: • Data bus: Simple initiator socket to access data memory. • Instruction bus: Simple initiator socket to access instructionmemory. • IRQ line: Simple target socket to signal external IRQs.
The simulator also includes a Bus controller in charge of the inter-connection of all modules. The bus controller decodes the accessesaddress and does the communication to the proper module. In theactual status of the project, it contains two target sockets (instruc-tion and data buses) and three initiator sockets:
Memory , Trace and
Timer modules, as described below.
The
Memory module simulates a simple RAM memory, which is themain memory of the SoC, acting as instruction memory and datamemory. This module can read a binary file in Intel HEX formatobtained from the .elf file and load it to be the main program for
Figure 2: Simulator running with an xterm windows as ter-minal Figure 3: Log file view the ISS. This module has a Simple target socket to be accessed thatsupports DMI to increase simulation speed.The simulated Soc includes a very basic
Timer module. Thismodule includes two 64 bits register mapped to 4 addresses. On ofthis registers ( mtime ) keeps current simulated time in nanosecodnsresolution. The second register ( mtimecmp ) is intended to programa future IRQ. The module triggers an IRQ using its Simple initiatorsocket.The
Trace module is a very simple tracing device, that outputsthrough a xterm window the characters received. This module isintended as a basic mimic of the ITM module of Cortex-M CPUs[10]. Figure 2 shows the simulator running with an xterm windowsas output console.Two other modules are included in the simulator:
Performance and
Log . The
Performance module take statistics of the simulation,like instructions executed, registers accessed, memory accesses, etc.It dumps this information when the simulation ends. The othermodule allows the simulator to create a log file with different levelsof information.At maximum level of logging, each instruction executed is loggedinto the file with its name, address, time and register values oraddresses accessed. The log file at maximum debug level showsinformation about the current time, PC value and the instructionexecuted. It also prints the values of the registers used. Figure 3shows a real executed log file.The log file at maximum debug level shows information about thecurrent time, PC value and the instruction executed. It also printsthe values of the registers used. Figure 3 shows a real executed logfile.
RISC-V SystemC-TLM simulator
The entire simulator is designed to work on pure bare-metal simula-tion. There is not direct communication between the simulator andthe host machine, meaning for instance that printf implementationoutputs directly to a host computer console. This is intended to doa simulation as similar to a real Hardware as possible, because thesame exact code and the compiled binary that runs in the simulatorwill run in the real SoC.For this reason the instructions
EBREAK and
ECALL are imple-mented in that way:
EBREAK stops the simulation and dump somestatistics. In a real system, has no sense to call
EBREAK instructionand depending of the implementation can trigger a system reset ora
NOP . The
ECALL instruction raises an exception, dump statisticsof the simulation and continues the execution for the same reason.To supply the lack of semi-hosting options, the
Trace modulecan be used to print out some information. With the use of properhelper functions, it is possible to use printf() -like functions. In thiscase, the _write function must be written to send the received datato
Trace module as follows: int _write(int file, const char *ptr, int len){ int x;for (x = 0; x < len; x++) {TRACE = *ptr++;}return (len);}
The initial value for the Program Counter register (PC) is ob-tained from the HEX binary file and set before starting the simula-tion. The stack pointer register (SP) is set to last memory address.This flexibility and the compatibility accomplished enables theuse of the standard GCC cross compiler with little options: -march=rv32imac -mabi=ilp32 --specs=nosys.specs
The options specifies the architecture and ABI (Application Bi-nary Interface) and specifies the bare-metal option for newlib stan-dard C library.This allows complete use of C library on the application code,including math library, stdio and string libraries.
A docker version of the simulator is provided [12]. It can be usedto ease the installation and use of the simulator to avoid user tocompile and gather all necessary libraries.This image has been used in conjunction with another dockerimage that contains a riscv-toolchain. It can be used to ease theinstallation and use of the simulator, and specifically, to avoid theuser to compile and gather all necessary libraries.The simulator image is published and available in docker hub[13].
Table 1: Performance result. Values in instructions/second
Test Native DockerTest1 8.252.929 3.854.110Test2 6.298.774 3.291.465Test3 8.921.763 3.754.295Test4 12.899.367 4.375.651Dhrystone 10.700.733 3.796.328
A porting of FreeRTOS version 10.2.1 were written for the simulatedSoC [2]. The simulator is able to run this complex project withoutany error. The FreeRTOS test project includes 3 tasks that communi-cate and synchronize using one common queue. The two producertasks use FreeRTOS’ delay functions to suspend for a amount oftime. Only one of the tasks prints out debug information.
Different test were done to ensure the compatibility of the simulator.Also some performance results are presented from the same tests.The compiler for RISC-V code is the RISC-V GCC version is 8.3.0build with ABI configured to ilp32 and architecture set to rv32i . The simulator implements RISC-V RV32IMACZicsr_Zifencei V2.1instruction set [20, 21] and it passes all tests in risc-test and riscv-compliance suites [16, 17]. The riscv-compliance tests have a cover-age of 97.23% for RV32I, 89.95% for RV32IM and 59.68% for RV32IMC.These percentage means the number of all possible instructionsand registers combinations are tested.A more complex program, the dhrystone benchmark test is passedwith correct results as well.The project code has been statically checked with coverity bySynopsis. The analysis results in only 1 minor error found in TLM-2library code but any error in the simulator code itself [18]. Also,code quality is checked with
Codacy tool [5]. This tool checks forcode quality, security, unused code, etc. The outcome of this tool isa A score, with only 10 minor warnings about code style.In the next section is discussed the performance of this simulator.
A set of four program are written to test the performance of thesimulator. Of these tests, test 1 checks memory transfer betweentwo memory locations; test 2 and 3 perform arithmetic operationsin three variables, one prints out the results and the other one is notusing the console; the last test uses string manipulation functionsfrom stdlib C library (printf, sprintf, strcpy).All test do a end-less loop of some mathematical operations andprints out the result using
Trace module. Each test is executed 3times for different execution time (from 10 to aprox. 60 secondsexecution time). The Figure 4 shows average of these 3 runs.Its performance varies mainly with the level of the logging sys-tem due to huge I/O traffic in the log file. With lowest level o logging,the performance of the simulator is about 8 million of simulated . Montón
Figure 4: Execution results for all testsFigure 5: Execution results for all tests with the Docker ver-sion instructions per second (see Table 1 and Figure 4) in a Intel Corei7-8550U CPU @ 1.88 GHz with 16 GB of memory. As a reference,in the same computer the Spike simulator performance is about170 million of simulated instructions per second.The low performance of Test2 can be due to the intensive use ofthe
Trace module and the overhead it implies.For the
Dhrystone benchmark, it is executed with good resultsand the performance is about 7200 Dhrystones/second. It has beentested with 10.000, 250.000 and 500.000 loops of the Dhrystone test.
The same tests has been run with the docker version of the simulator.The results are summarized in Table 1 and depicted in Figure 5.In case of docker version, the performance has a penalty from47% to up to 69% depending on the test. The performance of thisversion is depicted in Figure 5.
This paper introduces a new RISC-V simulator. It has been designedfrom scratch to simulate an entire SoC with simplicity on focus. Ithas been designed in SystemC and TLM-2 as language and modelingschema.It has been presented the main architecture of the simulator,the software configuration and tools required. Followed by a briefdiscussion about the simulation performance and the conformanceto the specifications.The use of standards is important in any aspects of the engineer-ing effort. In the case of system-level simulators, the existence ofthe TLM-2 and SystemC standards should be encourage and usedby vendors and researchers to increase the interoperability andre-usability of the components. This simple simulator is a first steptowards this achievement.
REFERENCES [1] 2012. IEEE Standard for Standard SystemC Language Reference Manual.
IEEEStd 1666-2011 (Revision of IEEE Std 1666-2005) (2012), 1–638.[2] Inc Amazon Web Services. 2020.
FreeRTOS HomePage
Proceed-ings of the Annual Conference on USENIX Annual Technical Conference (Anaheim,CA) (ATEC ’05) . USENIX Association, USA, 41.[4] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, AliSaidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, SomayehSardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D.Hill, and David A. Wood. 2011. The Gem5 Simulator.
SIGARCH Comput. Archit.News
39, 2 (Aug. 2011), 1–7. https://doi.org/10.1145/2024716.2024718[5] Codacy. 2020.
Codacy - RISC-V-TLM Dashboard . https://app.codacy.com/manual/mariusmm/RISC-V-TLM/dashboard[6] RISC-V foundation. 2020.
RISC-V Spike . https://github.com/riscv/riscv-tools[7] Imperas. 2020.
A Complete, Fully Functional, Configurable RISC-V Simulator .https://github.com/riscv/riscv-ovpsim[8] RISC-V International. 2020.
RISC-V Software Ecosystem Overview - Simulators .https://riscv.org/software-status/
The Rocket Chip Generator . Technical Report Technical Report UCB/EECS-2016-17. EECS Department, University of California, Berkeley. https://github.com/chipsalliance/rocket-chip[10] ARM Limited. 2020.
Cortex™-M3 Technical Reference Manual - ITM .http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337e/BABCCDFD.html[11] Neethu Bal Mallya, Cecilia Gonzalez-Alvarez, and Trevor E Carlson. 2018. Flexibletiming simulation of RISC-V processors with sniper.
Simulation
Linux J. mariusmm/riscv-tlm . https://hub.docker.com/repository/docker/mariusmm/riscv-tlm[14] Màrius Montón. 2020.
RISC-V-TLM Simulator . https://github.com/mariusmm/RISC-V-TLM[15] Morten Petersen. 2020.
Ripes . https://github.com/mortbopet/Ripes[16] RISCV.org. 2020.
RISC-V Compliance Task Group . https://github.com/riscv/riscv-compliance/[17] RISCV.org. 2020.
RISC-V Unit Tests . https://github.com/riscv/riscv-tests[18] Synopsys. 2020.
Coverity - RISC-V-TLM . https://scan.coverity.com/projects/mariusmm-risc-v-tlm[19] Tuan Ta, Lin Cheng, and Christopher Batten. 2018. Simulating multi-core RISC-Vsystems in gem5. In
Workshop on Computer Architecture Research with RISC-V .[20] Andrew Waterman and Krste Asanovi. 2019.
The RISC-V Instruction Set Manual,Volume I: User-Level ISA . Technical Report Version 20191213. RISC-V Foundation.[21] Andrew Waterman and Krste Asanovi. 2019.