uTango: an open-source TEE for the Internet of Things
11 U T A N G O : an open-source TEE for theInternet of Things
Daniel Oliveira, Tiago Gomes, Sandro PintoCentro ALGORITMI - University of Minho { daniel.oliveira, mrgomes, sandro.pinto } @dei.uminho.pt Abstract —Security is one of the main challenges of the Internetof Things (IoT). IoT devices are mainly powered by low-costmicrocontrollers (MCUs) that typically lack basic hardwaresecurity mechanisms to separate security-critical applicationsfrom less critical components. Recently, Arm has started torelease Cortex-M MCUs enhanced with TrustZone technology(i.e., TrustZone-M), a system-wide security solution aiming atproviding robust protection for IoT devices. Trusted ExecutionEnvironments (TEEs) relying on TrustZone hardware have beenperceived as safe havens for securing mobile devices. However,for the past few years, considerable effort has gone into unveilinghundreds of vulnerabilities and proposing a collection of relevantdefense techniques to address several issues. While new TEEsolutions built on TrustZone-M start flourishing, the lessonsgathered from the research community appear to be fallingshort, as these new systems are trapping into the d´ej`a vu pitfalls of the past. In this paper, we present U T ANGO , the firstmulti-world TEE for modern IoT devices. U T ANGO proposes anovel architecture aiming at tackling the major architecturaldeficiencies currently affecting TrustZone(-M)-assisted TEEs. Inparticular, we leverage the very same TrustZone hardwareprimitives used by dual-world implementations to create multiple,equally-secure execution environments within the normal world.We demonstrate the benefits of U T ANGO by conducting anextensive evaluation on a real TrustZone-M hardware platform,i.e., Arm Musca-B1. U T ANGO will be open-sourced and freelyavailable on GitHub in hopes of engaging academia and industryon securing the foreseeable trillion IoT devices.
Index Terms —Trusted Execution Environment, TrustZone,TEE, separation, isolation, IoT, Arm.
I. I
NTRODUCTION
With the increasing complexity of Internet of Things (IoT)devices and the door left open by Internet connectivity tohackers and attackers, developing secure IoT devices is becom-ing increasingly challenging [1]–[3]. Complex functional re-quirements are met by integrating multiple codebases, drivers,and libraries from different 3rd party entities with differentassurance levels. The problem is exacerbated by the lack ofsimple and reliable mechanisms to enforce separation amongthese multi-source and mixed-criticality components [4]–[6].In the context of secure computing systems, security throughseparation is a well-established principle implemented by mi-crokernels, hypervisors, and Trusted Execution Environments(TEEs) [7]–[23]. In particular, billions of mobile devicesworldwide rely on TEEs leveraging TrustZone hardware prim-itives for the protection of security-critical applications (e.g.,digital rights management, fingerprints, and keys) [11], [14],[15], [18], [19], [24]–[26]. TrustZone enables the partitionof system resources into two domains: the secure world for secrets and critical functionality, and the normal world foreverything else, including the rich operating system (OS) andits applications. Within the realm of the IoT, Arm has adaptedTrustZone technology for the Cortex-M family, introducingTrustZone-M into new Armv8-M MCUs, e.g., Cortex-M33 [6],[27], [28].
Problem and Motivation.
TrustZone-assisted TEEs are as-sumed to be highly secure. However, over the past years,TrustZone-assisted TEEs have been attacked hundreds of times[19], [29]–[34]. A recent study performed on five major com-mercial TrustZone-assisted TEEs (i.e., Qualcomm, Trustonic,Huawei, Nvidia, and Linaro) has identified that TrustZone-assisted TEEs have several architectural deficiencies , critical implementation bugs , and overlooked hardware properties [34]. From these three classes of problems, the architectural is-sues have the largest share. Among the identified deficiencies,we highlight (i) the excessively large trusted computing base(TCB), e.g., QSEE has 1.6 MiB, (ii) the large number of inter-faces, e.g., QSEE has 69 syscalls, (iii) the existence of severalprivileged secure kernel drivers, and (iv) the asymmetricalisolation between the worlds, e.g., trusted applications canmap normal world memory [34]. Furthermore, the majority ofthe implementation issues could be mitigated with adequatearchitectural mechanisms [34].In general, although these problems have mainly affectedcommercial implementations targeting Cortex-A processors,new TEE solutions built on TrustZone-M MCUs are fallingin the same pitfalls of the past. The Arm Trusted Firmware-M (ATF-M) [35] implements a large number of kernel com-ponents and security services within the secure world, andexisting memory protection mechanisms (i.e., secure MPU)are configured with too coarse-grained regions. As a result,ATF-M has a TCB with hundreds of KiloBytes (150+ KiB).The Kinibi-M was adapted from the original Kinibi TEE formobiles and not re-invented for the IoT [36]. And lastly, Armis currently spreading an ambiguous message with regard towhat should be deployed within the secure world. Multiple of-ficial Arm TrustZone-M documents [37]–[39] suggest differentapproaches. In particular, in Ref. [37], for an IoT applicationtargeting a wireless communication interface, Arm suggestsincluding (i) the secure boot, (ii) the communication stack,(iii) device drivers, (iv) OS kernel, and (v) firmware updatewithin the secure world and a single communication buffer onthe normal world.Thus, we argue that the current dual-world model is falling a r X i v : . [ c s . CR ] F e b short to address the increasing complexity and requirementsof modern IoT devices, as the number of functional blocksexpected to be consolidated largely exceeds two. We believethat a novel multi-world architecture, enabling execution ofmultiple environments within strongly isolated compartmentswould provide higher flexibility and increasing security guar-antees in the context of the IoT. Contributions.
In this paper, we present U T ANGO , an open-source TEE for the IoT that aims at tackling the main architec-tural deficiencies that currently affects TrustZone(-M)-assistedTEEs. To do so, U T ANGO proposes a novel TEE architecturethat leverages the very same TrustZone-M hardware primi-tives used by dual-world implementations to provide multipleequally-secure execution environments and augmented TEEcapabilities (e.g., SGX-like enclaves, availability guarantees).To the best of our knowledge, U T ANGO is the first multi-world TEE for TrustZone-M MCUs. To create an unlimitednumber of virtual worlds, U T ANGO leverages the dynamicreconfiguration capabilities of TrustZone-M controllers. Eachcontroller is dynamically programmed to partition systemresources according to each execution environment memory,devices, and interrupts assignments.The design follows three main principles: (i) principle ofminimal implementation , by providing a minimal and clean-slate implementation, with a small number of well-defined in-terfaces, thereby drastically reducing the overall system TCB;(ii) principle of least privilege , by ensuring that U T ANGO , asthe highest privilege entity, is the single component runningwithin the secure world; (iii) and principle of containment , byenforcing that each functional block executes on its isolatedexecution domain, which inherently prevents lateral movementand privilege escalation. The proof-of-concept implementationleverages best-in-class techniques to minimize performancepenalties and to make the TEE suitable for formal verification.The evaluation conducted on the Arm Musca-B1 shows that U T ANGO has a minimal impact on the performance and theTCB of the system is an order of magnitude smaller thanalternative solutions.To summarize, our main contributions are as follows:1) We present the design of U T ANGO as a novel TEE ar-chitecture leveraging TrustZone-M hardware primitives toprovide an unlimited number of equally-secure executionenvironments (Section III).2) We provide a proof-of-concept implementation of U T ANGO targeting the first public available TrustZone-M platform,i.e., Arm Musca-B1 (Section IV), and we have also de-veloped a Reference IoT Application. All software compo-nents will be open-sourced and available on GitHub.3) We perform a comprehensive security analysis and discusshow U T ANGO can mitigate potential attack vectors that amalicious adversary may explore (Section V).4) We extensively evaluate U T ANGO focusing on securitymetrics, performance, interrupt latency, and TCB size andcomplexity (Section VI). II. B
ACKGROUND
A. Arm TrustZone-M
Arm TrustZone follows a system-wide approach to security,providing hardware-enforced protection mechanisms at theCPU and System-on-Chip (SoC) level [19]. This technologyis centered around the concept of protection domains named secure world and normal world . TrustZone was firstly intro-duced into Arm application processors (Cortex-A) in 2004,achieving mainstream adoption in the mobile industry. In 2016,Arm decided to span TrustZone for the new generation ofArm MCUs, i.e., Armv8-M (e.g., Cortex-M23, Cortex-M33),naming this new version of the technology as TrustZone-M. From a high-level perspective, both technologies followthe same dual-world architecture. However, at the low-level,there are significant differences, mainly because TrustZone-Mis entirely optimized for low-end devices (e.g., deterministicbehavior, low overhead, and low-power consumption) [37].
Programming Model.
Armv7-M MCUs provide two oper-ation modes: thread and handler mode. In thread mode,the processor executes application code, which can be eitherprivileged or unprivileged. In handler mode, the processor ex-ecutes exception handler code, which is always privileged. In(Armv8-M) TrustZone-enabled MCUs, these operation modesare orthogonal to the two security states, i.e., there is both athread and handler mode for each security state. The securitystate does not depend on a specific security bit, but thedivision is memory map based. This means that, when thecode is running from the secure memory, the processor stateis secure, and, when the code runs from non-secure memory,the processor state is non-secure. Transitions between thetwo worlds are supported by three new instructions: branchwith exchange to non-secure state (BXNS), secure gateway(SG), and branch with link and exchange to non-secure state(BLXNS). Calling non-secure software from the secure state ispossible by performing a BLXNS instruction. In contrast, non-secure software cannot directly call secure software. Instead,non-secure software must use indirect entry points locatedin a Non-Secure Callable (NSC) memory region. The firstinstruction of any entry point must be an SG, which marksa valid branch to secure code. After the secure functioncompletes, a BXNS instruction issues a return to the non-secure software. Furthermore, state transitions can also happendue to exceptions or interrupts.
Resources Partitioning.
In Armv8-M MCUs, the memoryspace is partitioned through the so-called Attribution Units.The Security Attribution Unit (SAU) is always available andprovides dynamic address partitioning. The number of regionsis defined by the chip designer, which is typically 8. TheSAU is programmable in secure state. The Implementation-Defined Attribution Unit (IDAU) is external to the core,and it is implementation-defined. The IDAU provides staticaddress partitioning and supports up to 256 non-programmableregions. The configuration of the memory region’s securitystate results from the logical OR operation between theSAU and IDAU. Memory can also have a set of privilegepermissions, which are defined by a TrustZone-aware Memory Protection Unit (MPU). MPUs are banked among worlds;however, MPUs are optional and implementation-defined. Ad-ditionally to the attribution units, other components, referredto as security gates , ensure the overall system’s security.Besides the CPU, additional bus masters within the SoC areconnected to the bus (e.g., DMA controllers, crypto engines).TrustZone-enabled MCUs provide security gates, which act asfirewalls on TrustZone-oblivious slaves. These components arecontrolled by a central system security controller, specified bysilicon providers.
Interrupt Handling.
In TrustZone-enabled MCUs interruptscan be set as secure or non-secure by configuring the InterruptTarget Non-secure (ITNS) interface on the Nested Vector In-terrupt Control (NVIC). Arm’s M-profile architectures supportautomatic hardware stacking and un-stacking of some CPUregisters during exception entrance to reduce interrupt latency.Armv8-M-based architectures follow the same concept, al-though with a few differences. If the arriving exception orinterrupt has the same state as the current processor state, theexception sequence is almost identical. The main differenceoccurs when a non-secure interrupt triggers while secure codeis executing. To avoid information leakage, the processor au-tomatically pushes all non-banked registers to the secure stackand erases all its contents, which incurs in a slightly longerinterrupt latency. The vector table (and exception handlers)are banked between states, i.e., the processor supports twoseparated exception vector tables. Furthermore, secure andnon-secure interrupts can share the same priority level, orsecure interrupts can be programmed to have higher prioritythan non-secure ones (i.e., to avoid denial-of-service attacks).
B. Arm Trusted Firmware-M
The Arm Trusted Firmware-M (ATF-M) is an open-source,secure world firmware reference implementation, which offersthe foundations of a TEE for Armv8-M MCUs. The ATF-M implements: (i) a secure boot to verify the integrity andauthenticity of secure and non-secure binaries; (ii) a coremodule (i.e., ATF-M Core) that controls the isolation, com-munication, and execution; and (iii) a set of security servicesoffering secure storage, crypto, and attestation mechanisms.The implementation follows the traditional TrustZone dual-world architecture, i.e., all secure components (e.g., boot-loader, kernel modules, secure services, and 3rd party securityfunctions) are encapsulated within the secure world. Thesolution implements the isolation levels defined in the PlatformSecurity Architecture (PSA) [40], which rely on platform hard-ware (e.g., SAU, secure MPU) to enforce isolation boundaries.As of this writing, ATF-M only implements isolation levels 1and 2, which partitions the system into three major domains.Isolation level 1 establishes the two specific security domainsenabled by TrustZone (i.e., the secure and non-secure worlds).Isolation level 2 goes a step further and leverages the secureMPU to isolate the ATF-M core and services from 3rd partysecurity services. The latter are expected to be developed bymultiple entities, and are not the same services as the onesprovided by the ATF (i.e., secure storage, attestation). Isolation
STM32L5 Musca-B1Binary Size(KiB) bootloader
103 50 core+services
192 250 total
295 300
Code Size (SLoC) total
33 40
Security Metrics
147 257
TABLE I: ATF-M TCB size, code size, and security metricsfor NUCLEO-L552ZE-Q and Musca-B1 platforms. D U A L - W O R L D M O D E L M U L T I - W O R L D M O D E L Secure World TA TA TA TA N NSVW 1
IOT-OS A pp APP2 APP3 APPN
NSVW 2 TA NSVW N TA N NSVW 4 TA NSVW 3 TA Secure World uTANGO
Secure World uTANGO
TEE
Non-Secure World
IOT-OS
App App App App N A pp A pp A pp N WCC WS SP Arm TrustZone-M-enabled platform uTango Communication Channel
Fig. 1: Classic TEE’s dual-world model maped to U T ANGO multi-world architecture.level 3 is still under development and will provide fine-grainedisolation among different 3rd party security services.
ATF-M preliminary evaluation.
To understand the ATF-Mcodebase complexity, we performed a preliminary evaluationwith regard to binary size, code size, and security metrics ofthe ATF-M implementation for two different platforms: ArmMusca-B1 and STM NUCLEO-L552ZE-Q. Table I summa-rizes the assessed results. The off-the-shelf implementationof the ATF-M for the Arm Musca-B1 has a total size of 50KiB for the secure bootloader and 250 KiB for the ATF-Mcore and security services. The implementation has a largecode size, encompassing approximately 40 K source lines ofcode (SLoC). We also evaluated the number of indirect callsand Return-Oriented Programming (ROP) gadgets, which areimportant security metrics as further explained in Section VI.We counted a total of 257 indirect calls and 15597 ROPgadgets across the different binaries that comprise the TCB.The numbers for the STM port are a bit better, but in thesame order of magnitude, i.e., 192 KiB for the ATF-M coreand security services, 33 K SLoC, 9957 ROP gadgets, and 147indirect calls. In summary, this preliminary evaluation suggeststhat the same architectural issues highlighted in the literature[34], are being repeated. Furthermore, the formally verifiedmicrokernel seL4 [7], which targets high-end Linux-capableplatforms endowed with memory management unit (MMU),has a TCB smaller and a codebase simpler [34] than ATF-M,that is expected to run on resource-constrained MCUs.
III. U T ANGO D ESIGNU T ANGO aims at tackling the main architectural deficienciesprevailing on TrustZone-assisted TEE systems [34]. To do so, U T ANGO evolves from the classic dual-world security modelto a multi-world architecture (Figure 1). The multi-worldarchitecture is based on the zero-trust model, which dictatesthat every single software component, with exception of theTEE kernel, cannot be trusted. Thus, U T ANGO enables theconsolidation of multiple applications, services, or workloads(e.g., embedded OSes) on equally secure, isolated domains -called
Non-Secure Virtual Worlds (NSVW).Another particularity of the U T ANGO design, is relatedto the augmentation of the TEE model and capabilities.TrustZone-(M)-assisted TEEs implement remote procedurecall (RPC) architectures, i.e., a client-server model mainlyused for mobiles. Modern TEEs aim at providing a broaderrange of features and fulfilling a much large spectrum of usecases and requirements [22], [41]. Thus, U T ANGO not onlysupports the traditional core of the TrustZone design but alsoprovides capabilities similar to SGX-like enclaves [11], [18],[20]. Within the NSVWs, U T ANGO runs unmodified binaries,which can be user-space applications, services, and libraries,or privileged OS/RTOS and respective applications. Further-more, U T ANGO goes a step further by providing increasingavailability guarantees.The design of the U T ANGO multi-world architecture iscentered on three fundamental principles: (i) principle of min-imal implementation , (ii) principle of least privilege , and (iii) principle of containment . We comply with these constructionprinciples, and we show that its systematic application resultsin (i) a reduced TCB and attack surface, (ii) a well-definedlayered access control to prevent privilege escalation, and (iii)strong isolation boundaries to restrict workloads access to onlytheir own resources (preventing exploits containment).
Principle of minimal implementation.
To contain the system’sattack surface, U T ANGO must rely on hardware support asmuch as possible, and provide a minimal and clean imple-mentation of well-defined structures and interfaces. Moreover, U T ANGO must de-privilege secure applications/services to thenormal world, thus reducing the amount of code to be trustedand deployed on the secure side.
Principle of least privilege.
To mitigate privilege escalation, U T ANGO kernel must be granted the highest privilege ofexecution while de-privileging secure applications/services tothe normal world. Furthermore, each execution domain mustonly have access to those resources that are absolutely required(e.g., devices, system services).
Principle of containment.
To limit the extent of an attack, U T ANGO must ensure that the multiple execution domainsare well-defined and self-contained with clear boundaries. Thesystem must use hardware-enforced mechanisms to sandboxeach domain to its own resources (e.g., memory, devices),thereby limiting the reach of an attacker and preventing lateralmovement across other system components.
A. Architecture Overview
Figure 1 depicts, at the top, the traditional dual-worldTrustZone-M architecture. At the bottom, it illustrates anexample of U T ANGO architecture mapping legacy dual-worldsystems into a multi-world horizontal scheme, where eachfunctional block (i.e., IoT-OS and trusted applications) isassigned to an individual NSVW. Besides, as highlightedin the figure, the U T ANGO kernel is the single componentrunning at the most privileged level (i.e., secure handler mode),while all NSVWs run in the non-secure state. Following theaforementioned design principles, U T ANGO must strive for aclean-slate minimal implementation. Thereby, as depicted inFigure 1, the U T ANGO is built around three components: (i)the system partitioner (SP), (ii) the worlds scheduler (WS),and (iii) the worlds’ communication channel (WCC). TheSP relies on a configuration file detailing the overall systemconfiguration and partition.
System Configuration.
The first piece of U T ANGO workflowstarts with a configuration file (CFG file in Figure 1) thatdefines the properties of each NSVW. These properties en-compass memory regions (e.g., code, data), devices (e.g., serialperipherals, timers), and interrupts assigned to each NSVW, aswell as the overall system time quantum.
System Partitioner.
According to the system configuration,the SP is responsible for statically partitioning the platformresources at boot-time. The SP leverages the SAU and addi-tional platform-specific bus filters (security gates) to achievesuch partitioning. Based on each NSVW configuration, the SPprepares a corresponding SAU configuration and saves it onthe world’s control block (WCB). Regarding the security gates,the SP only performs a one-time setup. U T ANGO enforces thatall accesses and transactions issued by NSVWs to other busmasters are always trapped and mediated. This prevents thereconfiguration of each bus filter during the context switch,avoiding additional performance burdens.
Worlds Scheduler. U T ANGO enforces temporal separationthrough the WS. According to the system time quantum, theWS, supported by an architectural timing unit (e.g., ArmSysTick), schedules each NSVW in a round-robin fashion.Every NSVW has a unique WCB data structure for preservingthe world state, i.e., CPU register bank, selective SystemControl Block (SCB) registers, SAU configuration table, andinterrupts state. At every scheduling point, the WS performsfour main activities: (i) saves the suspended NSVW context tothe WCB; (ii) schedules a new NSVW to be resumed; (iii) setsnew partition regions on SAU; and (iv) restores the context ofthe new NSVW. Notice that, while setting a new configurationto the SAU, resources belonging to the suspended NSVWs arepreserved and marked as secure, preventing possible unautho-rized accesses from the running NSVW.
Worlds Communication Channel.
The U T ANGO offers acommunication infrastructure that allows the exchange ofsecure messages across NSVWs using message-passing, i.e.,no-shared memory. Messages have a fixed 12-byte data streamlength. U T ANGO provides four APIs (i.e., blocking and non-blocking) to send and receive messages. The WCC acts as
Arm Musca-B1
SECURE STATENON-SECURE STATE
NSVW running
FLASH (8MB) RAM(512KB)
NSVW
Device 1Device 0NSVW
DeviceID C P U V I E W B U S V I E W AHB interconnect F L A S H M P C R A M M P C D E V I C E S PP C Region
ID SAURBAR SAURLAR ‐ base+sizedev0 base dev0 top ‐ ‐ S A U Block
ID Blocks
Total
Size Region [0 ‐ ‐
29] 64
KiB code64
KiB code[0 ‐ ‐
31] 32
KiB data32
KiB data[32 ‐ KiB ‐ [252 ‐ KiB code+data
SYSTICK
NSVW U T ANGO
NSVW
NSVW (code)
NSVW (data)
NSVW (devs)
Non-Secure (NS)Secure (S)S from the CPU view (SAU),NS from the Bus view (MPC) [30 ‐ KiB ‐ uTango TCM
APIs (NSC) U T ANGO
WORLDS SCHEDULERSYSTEM PARTITIONERWORLDS COM. CHAN.
Fig. 2: U T ANGO full system implementation view (CPU and bus) for the Arm Musca-B1 targeting a two NSVW configuration.Snapshot of system configuration while NSVW U T ANGO I MPLEMENTATION
System Setup.
The U T ANGO
TEE was firstly implementedfor the Arm Musca-B1 Test Chip Board [42], which im-plements the SSE-200 subsystem that features a multi-coresystem with two Cortex-M33 processors. Despite the dual-core architecture, U T ANGO currently only supports a single-core configuration. The majority of MCU-based platforms arepowered by single-core CPUs. There are a few platforms thatare starting to include dual-core architectures, thus, multi-coresupport is in the roadmap, but currently out of the scope of thispaper. Figure 2 depicts a low-level view of U T ANGO runninga two NSVW configuration on the Arm Musca-B1. At thetime of this writing, we are currently working on the supportfor two additional platforms: NXP LPC55S69-EVK and STMNUCLEO-L552ZE-Q. U T ANGO
Hardware Components.
The main software com-ponents of U T ANGO rely on hardware primitives available onTrustZone-based MCUs. As previously mentioned, the WSuses the secure SysTick, as the temporal source for schedulingall NSVWs. To partition the system, the SP configures theSAU to overlap the fixed IDAU memory security regions andspecify the overall system memory layout. With the SAUcorrectly configured, core transactions (including data read-/write, instruction fetches, and debug access) are secured. Aspreviously stated, we assume bus masters are always securedand managed by U T ANGO , so TrustZone-aware bus slaves (i.e., memories and peripherals) need to be configured according tothe overall system security model. We configure Musca-B1security gates, i.e., the Memory Protection Controller (MPC)and the Peripheral Protection Controller (PPC), to match allNSVW’s memory and devices assignments. The remainingmemory blocks or device sets are kept secure.
A. Execution Life Cycle
The U T ANGO execution life cycle for a system configuredwith 2 NSVWs is depicted in Figure 3. The complete bootprocess consists of three stages: (i)
Initialization ; (ii)
SP Par-titioning ; (iii) and finally
Kicking-off . At run-time, U T ANGO isjust responsible for the
WS Scheduling . In the following, wedescribe each stage. (cid:13) U T ANGO
Initialization.
After reset, the U T ANGO bootagent initializes preliminary CPU- and platform-specific hard-ware components. Then, it reads the full system raw binaryfile (loaded onto the FLASH) and copies each software piece(i.e., U T ANGO kernel and NSVWs) to its respective memoryregion. The boot agent is also responsible for verifying theintegrity and authenticity of the U T ANGO kernel image and,in case of success, to copy the image to the Tightly-CoupledMemory (TCM). This implementation detail ensures almost all U T ANGO kernel instructions and memory accesses takes 1-2clock cycles and that there is no additional burden due to waitstates and bus/memory stalls (performance) and that code anddata are never cached (security). After copying each NSVWto the respective memory segment, the initialization concludesby configuring the secure MPU to enforce policies amongTEE kernel code and data sections, setting up the vector tableaddress of U T ANGO , and jumping to the main initializationroutine. After boot, the U T ANGO kernel starts executing by
NSVW
NSVW
SAU reconfigured w/ NSVW
Jump to NSVW (BLXNS)
PROCESSOR ’ S EXECUTING
CODE uTango Initialization sched-time run-timetick-time boot-time
TIME
SPPartitioning uTango Kicking off NSVW
System partitioning (SAU+security gates)
CPU init, SysTick init,
WCB init,
NSVW
NSVW Fig. 3: U T ANGO execution life cycle for a system configured with 2 NSVWs.first enabling and configuring fault exceptions. Non-secureexceptions are configured with lower priority than secure ones,thus preventing starvation of the secure side, i.e., avoid DoSattacks. The secure SysTick timer is then configured accordingto the system quantum configuration, i.e., the reload valueis loaded, and the timer exception is enabled. The last partof the
Initialization process fills the internal WCB structureswith the respective NSVWs’ static configurations. The WCBencompasses 9 general-purpose registers ( r4-r14 ), 8 specialpurpose registers (i.e., msp , psp , msp lim , psp lim , basepri , primask , faultmask , control ), and a subset of specific SCBregisters (e.g., vtor , scr ). To speed up the SAU re-configurationthe world switching operation, the SAU configuration for eachNSVW is defined as part of the WCB. The last part of theWCB includes an interrupt descriptor that keeps the NVICregisters’ context (e.g., priority level, enable and pendingstatus, and security state), which we further detail in SectionIV-B. (cid:13) SP Partitioning.
After the U T ANGO
Initialization , theexecution flow continues through the
SP Partitioning . The SPis responsible for leveraging all available hardware mech-anisms to partition the system resources according to theNSVW’s settings. First, the SP unrolls all NSVW’s memoryregions and checks for overlapping regions; if an overlap isidentified, U T ANGO aborts execution. If all memory regionsare valid, the SP starts programming the SAU with the memoryregions assigned to the first NSVW (NSVW (cid:13) U T ANGO
Kicking-off.
In the last boot stage, U T ANGO isresponsible for configuring the CPU state for the first NSVWand kick off the execution. A non-secure call is issued to theentry point of the NSVW (cid:13) WS Scheduling. At run-time , U T ANGO is mainly respon-sible for scheduling and context switching NSVWs. The WSkeeps the suspended NSVW states and resources in the secureworld while remapping the next-to-run NSVW resources asnon-secure. The WS process consists of four main steps (4.1-4.4). In the first step (4.1), the WS saves the processor contextof the suspended NSVW. Thus, all general-purpose and specialCPU registers, as well as selective SCB registers, are storedin the respective WCB. The WS is implemented in assembly,enabling these multiple accesses to the WCB memory segmentto be combined into a single store-multiple instruction (STM)to improve performance. The NVIC state is also preserved inthe WCB interrupt descriptor (details in Section IV-B). Next(4.2), the scheduler selects the next-to-run NSVW, accordingto a round-robin policy. The WS retrieves the stored SAUdata table from the scheduled NSVW’s WCB entry to pro-gram SAU in step three (4.3). The SAU re-configuration alsoleverages fine-grain assembly customizations by leveragingload-multiple instructions (LDM). However, due to the lack offast-reconfiguration optimization mechanisms available in theSAU, the WS needs to program all eight SAU’s regions byiteratively accessing the RNR register. Finally, in step (4.4),the context of the scheduled NSVW is loaded to the CPU.At this point, a branch is issued and processor execution isswitched to the non-secure state.
B. Worlds Interrupt Handling
In TrustZone-M MCUs, the NVIC registers are not bankedbetween security states. The ITNS register enables the configu-ration of the interrupt’s security target. Thus, once an interruptis configured as secure, accesses to the associated fields innon-secure aliases are read-as-zero. Thus, NVIC’s non-securestate must be preserved for each interrupt assigned to anNSVW. The WCB structure features a descriptor that holdsthe Interrupt Set Enable Register (ISER), Interrupt Set PendingRegister (ISPR), Interrupt Priority Register (IPR), and ITNS.Interrupt management was first implemented using a non-preemptive mechanism where interrupts are served as soon as U T ANGO
NSVW t boot Timer ISR
Timer handlerSPI ISR SPI handler
NSVW
Execution Priority t t t t t t t t WS WS WS (a) Non-preemptive world interrupt handling. boot Timer ISR
Timer handler SPI ISR
Execution Priority t SPI handler t t t t t t t t t U T ANGO
NSVW
NSVW
WS ForcedWS WS WS (b) Priority-based world interrupt handling.
Fig. 4: U T ANGO world interrupt handling.the respective NSVW is scheduled. In this case, i.e., the worst-case scenario, the interrupt latency is delayed by the amountof time needed to perform a complete round of NSVWs (i.e., (( N SV W − ∗ tick ) + schedtime ). However, for real-timeapplications, this latency may be prohibitive. Current effortsare going through the extension of U T ANGO to implementa preemptive priority-based mechanism. In the following, weonly explain the current implementation. However, for the sakeof clarity, we illustrate an example of the execution flow ofboth approaches in Figure 4.
Non-preemptive World Interrupt Handling.
Figure 4a illus-trates the non-preemptive interrupt handling flow. The verticalaxis depicts the execution environment and its respectiveexecution priority. U T ANGO is represented as the system’shigher-priority workload (smaller number in priority level)since, by design, the secure world is more privileged thanthe normal world. NSVW t ,will only be served as soon as its respective NSVW is putinto context (from t to t ). For this particular case, the SPIinterrupt latency is equal to t − t . The same behavior happensto the timer interrupt. The interrupt is triggered at t (whileNSVW t . U T ANGO enables this non-preemptive behaviorby saving and restoring each interrupt state during the contextswitching of NSVWs ( t to t and t to t ). In particular, atthe first scheduling point ( t to t ), the WS will save theNVIC’s state for the timer interrupt and restore the SPI’spending bit and security target before resuming the NSVW t , if the SPI interrupt isenabled, the pended request will force the processor to attendthe interrupt and start executing the respective SPI’s interrupthandler.Figure 4b illustrates the priority-based interrupt handlingflow. NSVWs are assigned (i.e., system designer) with apriority according to their criticality level. Higher-priority U T ANGO
NSVW
UART
NSVW
COMMAND LINE via serial protocol -
NSVW controlling a servo motor
PWM
WS+ SP +WCCMPC+PPC NON-SECURE STATE
SECURE STATE
Drivers
Workload - IRQs
UART - PWM+
SYSTICK_NS
SYSTICK_S
Arm Musca-B1
Hardware
Platform
TCP/IP
STACKproviding a web service
SPI+
ETHERNET
MBED TLS (KEYS/CERTIFICATES) providing a TLS library
Fig. 5: U T ANGO
IoT reference application.NSVWs will preempt the processor and grant execution toattend asynchronous events. In contrast, low-priority NSVWswill be blocked from interrupting the processor during theexecution of high-priority workloads.
C. Worlds Communication U T ANGO implements a secure blocking and non-blockingrequest-response messaging mechanism to enable NSVWs tocommunicate with each other. Messages exchanged betweenNSVWs are limited to a 12-byte data stream and are sentvia registers, avoiding sharing memory across worlds. TheWCC uses the internal inbox structure of each NSVW to passmessages across senders and receivers. U T ANGO exposes foursecure APIs for sending and receiving messages. These APIsare implemented through secure entry points located in a pre-defined NSC memory region (light-blue NSC region in Figure2), i.e., the WCC gateway. When an NSVW uses the sendingAPI, the 12-byte data stream is copied to registers (i.e., r4-r6 ), which are read and placed into the receiver’s inbox bythe WCC. When the receiver NSVW reads the message, itcalls the WCC via the receiving API that copies the inboxmessage (if full) to the registers. The WCC carefully avoidsinformation leakage by clearing the remaining CPU registers,before returning to the active NSVW.
D. Reference IoT Application
Figure 5 depicts the U T ANGO
IoT reference application. Itimplements a set of building blocks aiming at demonstratingthe applicability of U T ANGO to develop secure IoT devices.These building blocks implement the main features requiredby IoT devices, ranging from secure connectivity, real-timeoperation, and local management. Specifically, the NSVW U T ANGO
TEE provides hardware-enforced sep-aration among all consolidated workloads.
V. S
ECURITY A NALYSIS
TrustZone-M hardware primitives ensure hardware-enforcedisolation of system resources, i.e., code, data, devices, andinterrupts. Thus, TrustZone-M hardware hooks are sufficient tomaterialize the U T ANGO ’s vision of providing a new securitymodel that allows the consolidation of multiple, equally-secure, execution environments. Despite offering a strongfoundation for system-level security, such as data and codeprotection, TrustZone lacks defining anti-tampering mecha-nisms and side-channel protection. Therefore, physical andside-channel attacks are out-of-the-scope. Hence, we trust Armto provide a non-compromised TrustZone-M design in theMusca-B1 platform.Our threat model is based on the very same assumptions de-livered by the TrustZone-M [37], [39]. Firstly, we assume thatan adversary cannot escalate an attack to the secure software(i.e., U T ANGO ) through a compromised piece of non-securesoftware (i.e., an NSVW) via a local or remote software attack.Secondly, our multi-world security model takes advantage ofthe same hardware mechanisms that isolate the secure andnon-secure worlds to guarantee that a compromised NSVWcannot hijack other execution environments. In a nutshell, wemust assume the following: • NSVWs running on the non-secure side are untrusted. • U T ANGO kernel, including all its software components,e.g., WS, and SP, are trusted. • TrustZone-M hardware security extensions guaranteestrong isolation between secure and non-secure states. • The misuse of the TrustZone-M hardware components,such as the SAU and IDAU, can compromise the securitymodel. Thus, the configuration file which defines theoverall system partition must be trusted.Next, we discuss how U T ANGO can mitigate potential attackvectors that a malicious adversary could exploit.
Protection of U T ANGO kernel.
Enabled by TrustZone hard-ware controllers, the first isolation layer of the system sep-arates the U T ANGO kernel from the NSVWs. When thekernel partitions the system using the SAU, it ensures thatall NSVWs run in the normal world, while the kernel is keptprotected in the secure world. Therefore, the SAU controllervalidates all NSVW’s memory transactions, blocking accessto resources outside of its domain. NSVWs can also attemptto extract information from U T ANGO by leaking non-bankedCPU registers. The attack can be carried out after a statetransition, i.e., from secure to non-secure, which can happenafter the (i) boot stage or (ii) after context switching periods.The kernel prevents the leakage by (i) clearing all registersbefore jumping to the non-secure state and by (ii) inherentlyreplacing the CPU state with the scheduled NSVW context.
Isolation of NSVWs.
The U T ANGO partition prevents NSVWsto share memory regions, devices, and interrupt sources. Thebi-directional isolation of the NSVWs is enforced by the SAUand additional SoC security gates. In TrustZone-M platformsthe configuration of the SAU and security gates is restrictedto the secure world and managed by the U T ANGO kernel. The U T ANGO architecture allows developers to easily deploy 3rd party workloads. This feature can be leveraged by attackersto install malicious libraries or applications. Nevertheless, U T ANGO hardware-enforced isolation prevents a compromisedNSVW from overcoming its boundaries and escalating to otherdomains.
Secure Bus Masters.
As discussed in Section IV, we assumethat NSVWs interactions with additional bus masters areprotected and mediated by the U T ANGO . This design decisionavoids the overhead of configuring each bus filter (i.e., MPC,PPC) during context switching, saving a considerable amountof CPU clock cycles. Moreover, this also prevents an attackerfrom gaining access to the overall system’s secure or non-secure memory. However, in an application scenario where anNSVW needs access to a DMA controller, U T ANGO will offera set of secure services to interface these hardware modules.
Secure Boot.
By default, the processor starts in secure state,enabling root-of-trust implementations such as the secure boot. U T ANGO features a 2-stage secure bootloader to verify thefirmware image’s integrity and authenticity against a storesignature (SHA-512) held in secure memory. Upon detecting averification error, the U T ANGO boot is aborted and the systemis locked in a secure state until the next reset / power cycle. Ifthe image is valid, the boot process continues until it handlescontrol to the first NSVW to run.
Secure Services. As U T ANGO continuous to mature, we intendto develop and include a set of secure services, such as securestorage, cryptography operations, audit logs, and secure up-dates over-the-air. These secure services will be encapsulatedin dedicated NSVWs. However, depending on the peculiari-ties of the target platform, some hardware modules may behardwired to the secure world. To address this challenge, weenvision the development of lightweight (de-privileged) securegates to mediate access to secure world resources. Additionalhardware primitives (i.e., secure MPU) will be leveraged toenforce isolation within the secure world.VI. E
VALUATION
We evaluated U T ANGO on an Arm Musca-B1 Test ChipBoard, which features two Cortex-M33 processors, running at40 MHz (CPU0) and 160 MHz (CPU1). Firstly, in SectionVI-A, we assess a set of security metrics derived from theBenchIoT suite [43]. On Section VI-B, we evaluate perfor-mance, and in Section VI-C we focus on the interrupt latency.Lastly, in Section VI-D, we evaluate code and binary sizes.
A. Security Metrics
BenchIoT [43] is a recent benchmark suite and an evaluationframework to evaluate security solutions for IoT-based MCUs.The suite enables the automatic collection of 14 metrics forsecurity, performance, memory usage, and energy consump-tion. As of this writing, BenchIoT can only support Armv7-Marchitectures [43], i.e., BenchIoT cannot be used to evaluate U T ANGO . Notwithstanding, we performed a best-effort evalu-ation of U T ANGO ’s security based on the BenchIoT’s eightsecurity metrics while keeping as close as possible to theframework principles and metrics criteria.
According to the evaluation model presented in Ref. [43],we organized the security metrics in three goals: (i) minimiz-ing privileged execution (i.e., total of privileged and systemcall cycles); (ii) enforcing memory isolation (i.e., maximumcode and data region ratio); and (iii) control-flow hijackingprotection (i.e., number of available ROP gadgets and indirectcalls, and data execution prevention).
Minimizing Privileged Execution.
For Armv7-M MCUs, Ben-chIoT counts as privileged cycles all instructions executedin privileged thread and handler mode. For TrustZone-enableArmv8-M MCUs, all CPU modes are banked, but the secureworld is always considered more privileged than the normalworld. Thus, we count as privileged cycles all instructionsexecuted in the secure privileged thread and secure handlermode. In the context of our architecture, U T ANGO is the singlecomponent running in secure privilege thread mode, at boottime, and secure handler mode, at run-time. All NSVWs runwithin the normal world, and so all execution cycles are nottaken into consideration. Our results show that, at run-time, U T ANGO runs a total of 215 privileged cycles at each systemtick (Section VI-B). At boot-time, the total number of threadprivileged cycles is 7749 cycles (for 1 NSVW) and increases,on average, 1236 cycles for each extra added NSVW.BenchIoT also counts the number of SVC cycles. Unprivi-leged code can leverage this instruction to trigger a system callintended at executing privileged thread code. This mechanismcan be leveraged as a potential attack vector. In the context ofTrustZone-M MCUs, SVC calls can be issued in secure andnon-secure states. As aforementioned, non-secure SVCs arenot taken into consideration as normal world code is alwaysconsidered not privileged. U T ANGO does not issue any SVCcall, i.e., the number of secure SVC calls is 0. The executionflow has a well-defined entry and exit point, always runningin secure handler mode.
Enforcing Memory Isolation.
Another two security metricsevaluated by BenchIoT are the (i) maximum data region ratioand (ii) maximum code region ratio. These two metrics aimat assessing memory isolation’s effectiveness by computingthe size ratio of the maximum available code/data regionsto an attacker with respect to the total code/data size of theapplication binary [43]. U T ANGO isolates each environmentwithin a strong compartment, with boundaries of the NSVWdefined per binary needs. Thus, the maximum data and coderegion ratio is 0.
Control-Flow Hijacking Protection.
Code reuse attacks (i.e.,ROP gadgets and indirect calls) are among the most commonattack vectors used to hijack the control flow of an application.To measure the number of ROP gadgets, we used the ROP-Gadget tool [44]. U T ANGO has a total of 303 ROP gadgets.Notwithstanding, a deeper investigation unveiled that theseROPs belong to the boot-related code and are not executedduring runtime. As stated in Section IV-A, all the worldscheduling logic is implemented in assembly. Although thisnumber is an order of magnitude smaller comparing to ATF-M and the results presented in Ref. [43], we are aiming atsqueezing this value in the near future, by implementing also the boot logic in assembly or leveraging inline substitutionoptimizations. Indirect calls are another type of code reuseattacks that relies on using function pointers to hijack thecontrol flow. In the case of U T ANGO , we parsed the binaryfile and we found only one indirect call related to the secureto non-secure exit point, issued through a BLXNS instruction.Another important aspect in defending against control-flow hijacking is related to data execution prevention (DEP)mechanisms. In the context of Arm MCUs, proposed defensemechanisms leverage the MPU to enforce memory regions,either writable (data) or executable (code) [43]. U T ANGO cur-rently leverages the secure MPU to enforce DEP among kernelcode and data sections. Furthermore, as mention in Section V,we will also leverage the secure MPU to enforce isolation anddeploy a DEP defense mechanism among security gates.
B. Performance Overhead
World Switch Time.
The world switch time is defined asthe amount of time that the U T ANGO kernel takes to switchbetween NSVWs. As explained in Section IV-A, this operationincludes saving and restoring the worlds’ context (i.e., coreregisters, system registers, and NVIC), re-configuring the SAUregions (to enforce memory isolation), and run the scheduleralgorithm. To measure the world switch time, we used the DataWatchpoint and Trace (DWT) unit [45] from the CoreSightdebug system, which features a 32-bit cycle counter runningat the CPU clock frequency. The DWT cycle counter is readbefore and after completing the WS operation. We collected1000 samples. All the collected samples reported exactly 215clock cycles (i.e., 5.4 microseconds @ 40 MHz). The highdeterminism is a reflex of (i) the characteristics of the Armv8-M architecture and (ii) from the fact U T ANGO runs from aTCM (Section IV-A). The reduced world switch time is a con-sequence of (i) the raw assembly implementation of the WSlogic and the (ii) TCM. For instance, when configured witha 10 milliseconds (ms) tick rate, the expected performancepenalty is a negligible 0.054%.
Run-time Overhead.
To evaluate the run-time overhead, wehave used Embench (version 0.5) [46]. Embench is a free andopen-source benchmark suite specially designed for deeplyembedded systems. Assuming the presence of no OS andminimal C library support, Embench targets small deviceswith at least ≤
64 KiB of Flash (ROM) and ≤
16 KiB ofRAM. Embench consists of 19 real programs, representativesof the following metrics: branch, memory, and computingrequirements. Each benchmark reports a single summarizingperformance score that outputs the geometric mean and geo-metric standard deviation ratios relative to a reference platformor setup, which in our case represents the target Musca-B1 TestChip Board. Benchmarks and U T ANGO were compiled usingthe target platform and toolchain configuration described inTable II. Despite Arm Musca-B1 featuring a dual asymmetricCortex-M33 MCU, U T ANGO currently only supports a single-core configuration. Thus, in our experiments, we have onlyenabled the CPU0, running at 40 MHz.We have run the benchmarks natively on the target platform.Then, each benchmark was executed with U T ANGO , config- ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) R a t i o Baseline 1 NSVW 2 NSVW 3 NSVW 4 NSVW (a) 0.5 ms U T ANGO tick. ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) ( . m s ) R a t i o Baseline 1 NSVW 2 NSVW 3 NSVW 4 NSVW (b) 10 ms U T ANGO tick.
Fig. 6: Performance overheads (ratio) of Embench benchmark suite relative to bare-metal execution.
Platform Tool chain frequency
40 MHz (CPU0)
Compilerversion
GNU Arm Embedded Toolchainarm-none-eabi-gcc 9.3.1 max.frequency
160 MHz (CPU1)
Linkerversion
GNU binutils ld version 2.34.0 architecture
Armv8-M
Flags isa
Thumb/Thumb-2 compiler -Os -march=armv8-m.main-mcpu=cortex-m33+nodsp-ffunction-sections-mfloat-abi=softfp -mthumb address size code mem-ory size
512 KiB eSRAM data mem-ory size
512 KiB iSRAM linker -O2 -Wl,-gc-sections-march=armv8-m.main-mcpu=cortex-m33+nodsp-mfloat-abi=softfp -mthumb-specs=nosys.specs processorname
Cortex-M33 caches active cores libs ’user libs’: [’-lm’]
TABLE II: Platform, toolchain, and compilation details.ured to support 1, 2, 3, and 4 NSVWs. The benchmarks wereset to run in the first NSVW environment, while the otherNSVWs are running a toy bare-metal application, implement-ing a bare infinite loop (i.e., other worlds are just consumingtheir CPU quantum). Experiments were repeated for differenttick rate configurations, ranging from 0.5 ms to 10 ms. Theachieved results for 0.5 and 10 ms are illustrated in Figure 6,where each bar (representing 1, 2, 3, and 4 NSVWs) depictsthe respective ratios relative to the baseline. On top of thefirst and second bar, it is also presented the absolute executiontime, in clock cycles, and total execution time in ms (withinparentheses).Looking at Figure 6, we can drawn four main conclusions.Firstly, for a configuration with 1 NSVW, the overhead in-troduced by U T ANGO is almost residual. For instance, fora 10 ms tick rate, the average performance overhead is0.05%, which is within the expected theoretical overhead(see world switch time). For a 500 microseconds tick rate,which is considered an abnormal high switching rate (i.e.,highly responsive system), the average performance overheadis less than 1%. Secondly, we can observe that the perfor-mance overhead increases (almost) linearly with the number of NSVW’s, i.e., the third, fourth, and fifth bars (2, 3, and4 NSVWs, respectively) increase the performance overheadby a similar ratio. This impact is expected and is a naturalconsequence of sharing the CPU among all NSVWs, i.e., eachNSVW gets a CPU quantum equal to the tick rate. A thirdand interesting observation is related to the heterogeneity onthe performance overhead ratio when the system is configuredwith multiple NSVWs, in particular, for 3 and 4 NSVWs. Thisphenomenon becomes even more evident for the experimentsconducted with a 10 ms tick rate. We observed that increasingthe U T ANGO tick rate may suggest that, for the majority of thebenchmarks, the system gets an increase of performance (i.e.,decrease of the performance overhead ratio). The most evidentcase is observed for the huffbench benchmark, which for a 10ms tick rate, suggests that there is no performance penalty,i.e., the ratio is always 1 no matter how many NSVWs arerunning in the system. While this may look a bit surprisingat a first sight, a deeper investigation pointed out that this isa consequence of the execution time of the benchmark. Aspresented on top of the first two bars (in parentheses), thenative execution time of benchmarks range from a few ms todozens of ms. For instance, the bare execution of the huffbenchtakes around 4.39 ms while the nettle-aes takes 133.22 ms. So,when the system is sharing the CPU among multiple NSVWs,and depending on the tick rate, the benchmark may finish inless rounds, decreasing the performance overhead ratio. Werepeated the experiments also for a tick rate of 1, 2, and 5ms, and these again show very clearly the explained pattern.In a corner case, for a 150 ms tick rate, all benchmarks for allNSWVs configurations would present a performance overheadratio of 1. Finally, and as complementary to the phenomenondescribed above, we also observe that there are some exoticbenchmarks, i.e., crc32 , nbody , picojpeg , and qrduino , thatpresent an increase of the performance overhead ratio. Inthis case, this apparent increase of performance overhead isjustified by the warm-up time of the benchmark. Thus, whena higher warm-up time is required, most of the available G eo m e t r i c m ean R a t i o s (a) Performance overhead for 1, 2, 3, and 4 NSVWs. G eo m e t r i c m ean R a t i o s (b) Zoomed view of the performance overhead for 1 NSVW. Fig. 7: Performance overhead vs variation of U T ANGO tick fordifferent configurations.time slots in the first round are wasted, which results in anadditional impact in the final performance overhead ratio.Figure 7 depicts the impact of the U T ANGO tick rate vari-ation on the overall performance overhead. We have repeatedthis experiment for four different tick rates (0.5 ms, 1 ms,2 ms, and 10 ms), where each obtained value correspondsto the geometric mean ratio for the full-run of the Embenchsuite running in systems configured with different NSVWs.From the obtained results, we can validate the phenomenondescribed above. While increasing the U T ANGO tick rate,there is an apparent decrease of the performance overhead(Figure 7). Looking at Figure 7b, we can also concludethat the performance overhead increases exponentially whiledecreasing the tick rate. However, this exponentially increaseis very acceptable, because for a 500 microseconds tick rate,the performance overhead is less than 1%. This impact willbe less noticeable in platforms running at higher frequency,e.g., the NXP LPC55S69-EVK and STM NUCLEO-L552ZE-Q, which running at a frequency approximately three timeshigher, the overhead would decrease by a factor of three.
C. Interrupt Latency
To measure the interrupt latency, we crafted a minimal bare-metal benchmark application, running in the NSVW U T ANGO tick rates (0.5 ms and 10 ms), while varying the number ofNSVWs to be scheduled. The results were obtained by taking1000 samples. Figure 8 depicts the relative frequency of eachinterrupt latency measurements. The results are expressed inthe number of clock cycles required by the CPU to start R e l a t i v e F r equen cy (a) U T ANGO with 0.5 ms tick. R e l a t i v e F r equen cy (b) U T ANGO with 10 ms tick.
Fig. 8: Relative frequency of interrupt latency expressed inclock cycles.executing the timer handler. According to Figure 8, we candrawn two major conclusions.Firstly, depending on when the interrupt is triggered, we canachieve better or worse execution times: (1) if the interrupt istriggered while the NSVW handler is executing, the measuredlatency is near its native values (observed for all configurationswith one world). The NSVW receives the interrupt transpar-ently through normal hardware interrupt behavior, and the finalinterrupt handler executes within a fixed 24 clock cycles; (2)if the interrupt triggers when a different NSVW is active, theinterrupt latency increases significantly since the interrupt willonly be handled when the assigned NSVW is scheduled.The second takeaway point comes from observing a directrelation between the system tick rate, the number of NSVWs,and the time interval selected for the interrupting timer (10ms).Considering the scenario with 2 NSVW, depicted in Figure 8a,the interrupt latency for the 1000 collected samples shows arelative frequency of 100% for 24 clock cycles. Such resultsare explained by the number of rounds (each one taking 1 ms)needed to complete the 10 ms interval (at a tick rate of 0.5ms) when two worlds are configured, which is 20. This meansthat the timer interrupt will be triggered approximately whenthe handling world is executing. On the other hand, in thescenario with 3 NSVW, each round takes 1.5 ms to complete;therefore, the system needs ≈ D. Code and Binary Size U T ANGO was developed from-scratch with no dependencieson compiler or external libraries. Table III reports (i) thenumber of SLoC and (ii) the binary size.
Source lines of code.
To count the number of SLoC, we usedthe SLOCCount tool [47]. U T ANGO implementation code is di-vided into three main directories: (i) arch , targetting Armv8-Marchitectural-specific functionalities; (ii) platform , containingplatform-specific code (e.g., memory and peripheral protectioncontrollers); and (iii) core , i.e., U T ANGO boot and schedulerlogic (e.g., memory and devices partition, and system timer Directory SLoC size (bytes) C asm total .text .data .bss total /arch/armv8-m
787 393 1180 1336 0 0 1336 /platform/MUSCAB1 /core
264 0 264 552 0 652 1204 /config (2 worlds)
93 0 93 0 108 0 108 total
TABLE III: U T ANGO : SLoC and binary size (bytes) perdirectory.configuration). From Table III, it is possible to conclude thatthe architectural and platform-specific code represent most ofthe total SLoC. Since U T ANGO ’ S heavy lifting work is duringboot-time, i.e., system resources partition, hardware initializa-tion, and configuration, it is normal that these two componentsreflect the major part of the U T ANGO code complexity ( ≈ ≈
200 SLoC,corresponding to 4.6% of the total SLoC.
Binary size.
To measure the size (bytes) of U T ANGO , we usethe GCC size tool (Berkeley format). Table III presents thetext, data, and bss sections, according to system component,i.e., organized by directories. As highlighted above, target-specific functionality (e.g. SAU, SysTick, MPC, and PPCdrivers) included in arch and platform directories representapproximately 2/3 of the total U T ANGO ’ S size. At boot-time, U T ANGO core allocates the WCB structure and performsinitialization routines. For each configured world, the systemallocates 324 bytes of data for its private WCB. During WCB’sinitialization, U T ANGO retrieves from the config structure (60bytes) each world’s configuration. This structure is filled bythe system designer to describe, per world, the memory layout,available devices, and assigned interrupts. Regarding run-timecode, the total size is 488 bytes, which represents the codeimplementing the scheduling logic. Thus, the resulting TCBsize is 4.3 KiB. VII. R
ELATED W ORK
There is a rich body of runtime environments, isolationtechniques and mechanisms, and architectures for secure ex-ecution and isolated environments [7], [9]–[23], [48]. Dueto the extensive list of works, we will focus on the twofollowing classes of solutions that target IoT devices poweredby resource-constrained MCUs.
TEE systems for the IoT.
TEE systems for resource-constrained IoT devices are in their infancy, and only a fewcommercial and academic solutions have been proposed sofar. Janjua et al. [17] have developed the Security MicroVisor(SuV), a pure-software TEE for resource-constrained devicesthat lack basic hardware-based security features such as MPU(e.g., AVR ATmega). MultiZone TEE [26] is an innova-tive hardware-enforced, software-defined TEE for (Armv7-M) Cortex-M and RISC-V MCUs. MultiZone leverages theArm MPU or the RISC-V Physical Memory Protection (PMP)to create multiple isolated environments. In the context of TrustZone-M MCUs, ATF-M [35] provides an open sourcereference implementation of a TEE for Armv8-M devices.Kinibi-M [36] and ProvenCore-M [49] are preeminent ex-amples of commercial TEE solutions adapted from existingwell-established Cortex-A implementations. mTower [50] is anopen source initiative from Samsung aiming at developing aTEE specially designed to protect size-constrained IoT devicesbased on the Cortex-M23. Contrary to the aforementionedTrustZone-M solutions, which are a strict materialization ofthe TrustZone dual-world architecture, U T ANGO relies on amulti-world design, providing multiple isolated environmentswithin the normal world, and thus addressing the main ar-chitectural deficiencies observed in commercial TrustZonesystems while providing augmented TEE capabilities. To thebest of our knowledge, MultiZone [26] is the closest solutionto U T ANGO . Notwithstanding, comparing to our approach,MultiZone for Arm Cortex-M requires (i) static binary trans-lation to handle special privileged instructions and imprecisebus faults and (ii) implements trap and emulation. Thereis also a preeminent class of solutions that proposes a setof mechanisms for TrustZone-M devices. CoreLockr-TZ [51]is a lightweight service dispatch layer and CFI CaRE [27]implements a prime control-flow integrity (CFI) mechanism.Finally, ASSURED [28] proposes a secure firmware updateframework for TrustZone-M devices.
Reliable systems for the IoT.
Classic approaches to pro-vide isolation and implement reliable systems on resource-constrained devices have been evolving from constructive(language/compiler-based) memory protection [52]–[56] andhardware-enforced RTOS mechanisms [21], [57], [58], tolightweight virtualization infrastructures [5], [6], [59], [60].Tock [53] leverages limited hardware protection mechanismsas well as the type-safety features of the Rust programminglanguage to provide a reliable multiprogramming environmentfor MCUs. EPOXY [54] proposes a technique called priv-ilege overlaying and uXOM [55] implements a protectionmechanism that leverages the LLVM compiler to translate allmemory instructions into unprivileged ones, constraining thecode region using the MPU available on Cortex-M MCUs.In [56], Peach et al. propose eWASM, which is a runtimeto constrain memory accesses and control flow, enabled bythe aWsm compiler. Several widespread embedded (RT)OSessuch as Mbed OS [61], FreeRTOS [62], and Zephyr [63] havealready upstream support for task isolation using the MPU.In [21], Hahm et al. have enhanced TizenRT with user-levelreliability mechanisms, i.e., fault isolation and fault recovery .Another class of approaches have proposed lightweight vir-tualization solutions for resource-constrained devices. F. Paciet al. [60] proposed a lightweight I/O virtualization approachusing the MPU support on FreeRTOS to create a task whichmediates all I/O accesses. F. Bruns et al. [59] and R. Pan et al.[5] have proposed virtualization infrastructures leveraging theMPU. Pinto et al. [6] have also proposed a TrustZone-basedvirtualization solution for Cortex-M MCUs. VIII. C
ONCLUSION
In this paper, we presented U T ANGO , the first multi-worldTEE for TrustZone-M IoT devices. Our innovative designenables the execution of multiple environments within stronglyisolated compartments with increasing flexibility and securityguarantees. U T ANGO will be publicly available in hopes ofengaging both academia and industry on research and deploy-ment of innovative TEE solutions for the IoT.R
EFERENCES[1] S. L. Keoh, S. S. Kumar, and H. Tschofenig, “Securing the internet ofthings: A standardization perspective,”
IEEE Internet of Things Journal ,vol. 1, no. 3, pp. 265–275, 2014.[2] O. Alrawi, C. Lever, M. Antonakakis, and F. Monrose, “SoK: SecurityEvaluation of Home-Based IoT Deployments,” in , 2019, pp. 1362–1380.[3] E. Cozzi, P. Veroer, M. Dell’Amico, Y. Shen, L. Bilge, and D. Balzarotti,“The Tangled Genealogy of IoT Malware,” in
Annual Computer SecurityApplications Conference (ACSAC) , 2020, p. 16.[4] A.-R. Sadeghi, C. Wachsmann, and M. Waidner, “Security and pri-vacy challenges in industrial Internet of Things,” in , 2015, pp. 1–6.[5] R. Pan, G. Peach, Y. Ren, and G. Parmer, “Predictable virtualization onmemory protection unit-based microcontrollers,” in ,2018, pp. 62–74.[6] S. Pinto, H. Araujo, D. Oliveira, J. Martins, and A. Tavares, “Virtual-ization on TrustZone-Enabled Microcontrollers? Voil`a!” in , 2019, pp. 293–304.[7] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin,D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell,H. Tuch, and S. Winwood, “SeL4: Formal Verification of an OS Kernel,”in
ACM SOSP , New York, NY, USA, 2009, p. 207–220.[8] M. Payer and T. R. Gross, “Fine-Grained User-Space Security throughVirtualization,” in
Proceedings of the 7th ACM SIGPLAN/SIGOPSInternational Conference on Virtual Execution Environments , ser. VEE’11. New York, NY, USA: Association for Computing Machinery,2011, p. 157–168.[9] P. Koeberl, S. Schulz, A.-R. Sadeghi, and V. Varadharajan, “TrustLite:A Security Architecture for Tiny Embedded Devices,” in
Proceedings ofthe Ninth European Conference on Computer Systems , ser. EuroSys ’14.New York, NY, USA: Association for Computing Machinery, 2014.[10] F. Brasser, B. E. Mahjoub, A.-R. Sadeghi, C. Wachsmann, and P. Koe-berl, “TyTAN: Tiny trust anchor for tiny devices,” in , 2015, pp. 1–6.[11] H. Sun, K. Sun, Y. Wang, J. Jing, and H. Wang, “Trustice: Hardware-assisted isolated computing environments on mobile devices,” in
De-pendable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIPInternational Conference on Dependable Systems and Networks . IEEE,2015, pp. 367–378.[12] Y. Cho, J. Shin, D. Kwon, M. Ham, Y. Kim, and Y. Paek, “Hardware-Assisted On-Demand Hypervisor Activation for Efficient Security Crit-ical Code Execution on Mobile Devices,” in . Denver, CO: USENIXAssociation, Jun. 2016, pp. 565–578.[13] V. Costan, I. Lebedev, and S. Devadas, “Sanctum: Minimal HardwareExtensions for Strong Software Isolation,” in . Austin, TX: USENIX Association,Aug. 2016, pp. 857–874.[14] A. M. Azab, K. Swidowski, R. Bhutkar, J. Ma, W. Shen, R. Wang,and P. Ning, “SKEE: A Lightweight Secure Kernel-level ExecutionEnvironment for ARM,” in
Proceedings of the Network and DistributedSystem Security Symposium , 2016.[15] A. Ferraiuolo, A. Baumann, C. Hawblitzel, and B. Parno, “Komodo:Using verification to disentangle secure-enclave hardware fromsoftware,” in
Proceedings of the 26th Symposium on OperatingSystems Principles , ser. SOSP ’17. New York, NY, USA: Associationfor Computing Machinery, 2017, p. 287–305. [Online]. Available:https://doi.org/10.1145/3132747.3132782 [16] P. Maene, J. G¨otzfried, R. de Clercq, T. M¨uller, F. Freiling, andI. Verbauwhede, “Hardware-Based Trusted Computing Architectures forIsolation and Attestation,”
IEEE Transactions on Computers , vol. 67,no. 3, pp. 361–374, 2018.[17] H. Janjua, M. Ammar, B. Crispo, and D. Hughes, “Towards a Standards-Compliant Pure-Software Trusted Execution Environment for Resource-Constrained Embedded Devices,” in
Proceedings of the 4th Workshopon System Software for Trusted Execution , ser. SysTEX ’19, New York,NY, USA, 2019.[18] F. Brasser, D. Gens, P. Jauernig, A.-R. Sadeghi, and E. Stapf, “SANC-TUARY: ARMing TrustZone with User-space Enclaves,” in
Network andDistributed Systems Security (NDSS) Symposium , 2019.[19] S. Pinto and N. Santos, “Demystifying Arm TrustZone: A Comprehen-sive Survey,”
ACM Comput. Surv. , vol. 51, no. 6, pp. 130:1–130:36, Jan.2019.[20] W. Li, Y. Xia, L. Lu, H. Chen, and B. Zang, “TEEv: VirtualizingTrusted Execution Environments on Mobile Platforms,” in
Proceedingsof the 15th ACM SIGPLAN/SIGOPS International Conference on VirtualExecution Environments , ser. VEE 2019. New York, NY, USA: ACM,2019, pp. 2–16.[21] S. il Hahm, J. Kim, A. Jeong, H. Yi, S. Chang, K. SN, A. Chauhan, andS. P. Cherian, “Reliable Real-Time Operating System for IoT Devices,”
IEEE Internet of Things Journal , pp. 1–1, 2020.[22] D. Lee, D. Kohlbrenner, S. Shinde, K. Asanovi´c, and D. Song,“Keystone: An Open Framework for Architecting Trusted ExecutionEnvironments,” in
Proceedings of the Fifteenth European Conference onComputer Systems , ser. EuroSys ’20. New York, NY, USA: Associationfor Computing Machinery, 2020.[23] R. Bahmani, F. Brasser, G. Dessouky, P. Jauernig, M. Klimmek, A.-R. Sadeghi, and E. Stapf, “CURE: A Security Architecture with CUs-tomizable and Resilient Enclaves,” in . Vancouver, B.C.: USENIX Association, Aug.2021.[24] N. Santos, H. Raj, S. Saroiu, and A. Wolman, “Using arm trustzoneto build a trusted language runtime for mobile applications,”
SIGARCHComput. Archit. News , vol. 42, no. 1, p. 67–80, Feb. 2014.[25] Z. Hua, J. Gu, Y. Xia, H. Chen, B. Zang, and H. Guan, “vTZ: Virtu-alizing ARM TrustZone,” in
USENIX Security Symposium . USENIXAssociation, 2017, pp. 541–556.[26] S. Pinto and C. Garlati, “Multi Zone Security for Arm Cortex-MDevices,” in
Embedded World Conference 2020 , no. March, 2020, p. 6.[27] T. Nyman, J. Ekberg, L. Davi, and N. Asokan,
CFI CaRE: Hardware-Supported Call and Return Enforcement for Commercial Microcon-trollers . Springer International Publishing, 2017, pp. 259–284.[28] N. Asokan, T. Nyman, N. Rattanavipanon, A.-R. Sadeghi, and G. Tsudik,“ASSURED: Architecture for Secure Software Update of RealisticEmbedded Devices,”
IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems , vol. 37, no. 11, pp. 2290–2300, Nov2018.[29] N. Zhang, H. Sun, K. Sun, W. Lou, and Y. T. Hou, “CacheKit: EvadingMemory Introspection Using Cache Incoherence,” in
IEEE EuropeanSymposium on Security and Privacy , March 2016, pp. 337–352.[30] M. Lipp, D. Gruss, R. Spreitzer, C. Maurice, and S. Mangard, “AR-Mageddon: Cache Attacks on Mobile Devices,” in
USENIX Conferenceon Security Symposium . USENIX Association, 2016, pp. 549–564.[31] A. Machiry, E. Gustafson, C. Spensky, C. Salls, N. Stephens, R. Wang,A. Bianchi, Y. R. Choe, C. Kruegel, and G. Vigna, “BOOMERANG:Exploiting the Semantic Gap in Trusted Execution Environments,” in
Network and Distributed System Security Symposium , 2017.[32] A. Tang, S. Sethumadhavan, and S. Stolfo, “CLKSCREW: Exposing theperils of security-oblivious energy management,” in
USENIX SecuritySymposium . USENIX Association, 2017, pp. 1057–1074.[33] K. Ryan, “Hardware-Backed Heist: Extracting ECDSA Keys fromQualcomm’s TrustZone,” in
Proceedings of the 2019 ACM SIGSACConference on Computer and Communications Security , ser. CCS ’19.New York, NY, USA: Association for Computing Machinery, 2019, p.181–194.[34] D. Cerdeira, N. Santos, P. Fonseca, and S. Pinto, “SoK: Understand-ing the Prevailing Security Vulnerabilities in TrustZone-assisted TEESystems,” in
IEEE Symposium on Security and Privacy (SP) Embedded World Conference 2020 , no. March,2020, p. 5.[42] Arm, “Arm® musca-a test chip and board technical reference manual,”Arm Ltd., Tech. Rep., Jan 2018.[43] N. S. Almakhdhub, A. A. Clements, M. Payer, and S. Bagchi, “Benchiot:A security benchmark for the internet of things,” in
Journal of Open Source Software
International Conference on EmbeddedNetworked Sensor Systems (SenSys) , 2007, pp. 205–218.[53] A. Levy, B. Campbell, B. Ghena, D. B. Giffin, P. Pannuto, P. Dutta, andP. Levis, “Multiprogramming a 64kB Computer Safely and Efficiently,”in
Symp. on Operating Systems Principles (SOSP) , 2017, pp. 234–251.[54] A. A. Clements, N. S. Almakhdhub, K. S. Saab, P. Srivastava, J. Koo,S. Bagchi, and M. Payer, “Protecting Bare-Metal Embedded Systemswith Privilege Overlays,” in , 2017, pp. 289–303.[55] D. Kwon, J. Shin, G. Kim, B. Lee, Y. Cho, and Y. Paek, “uxom:Efficient execute-only memory on ARM cortex-m,” in . Santa Clara, CA: USENIXAssociation, Aug. 2019, pp. 231–247.[56] G. Peach, R. Pan, Z. Wu, G. Parmer, C. Haster, and L. Cherkasova,“eWASM: Practical Software Fault Isolation for Reliable EmbeddedDevices,”
IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems , vol. 39, no. 11, pp. 3492–3505, 2020.[57] O. Hahm, E. Baccelli, H. Petersen, and N. Tsiftes, “Operating systemsfor low-end devices in the internet of things: A survey,”
IEEE Internetof Things Journal , vol. 3, no. 5, pp. 720–734, 2016.[58] M. Silva, D. Cerdeira, S. Pinto, and T. Gomes, “Operating Systemsfor Internet of Things Low-End Devices: Analysis and Benchmarking,”
IEEE Internet of Things Journal , vol. 6, no. 6, pp. 10 375–10 383, 2019.[59] F. Bruns, D. Kuschnerus, and A. Bilgic, “Virtualization for Safety-critical, Deeply-embedded Devices,” in
ACM Symposium on AppliedComputing (SAC) , 2013, pp. 1485–1492.[60] F. Paci, D. Brunelli, and L. Benini, “Lightweight IO virtualization onMPU enabled microcontrollers,”