[PDF] App Developer Centric Trusted Execution Environment

Abstract

ARM TrustZone is the de-facto hardware TEE implementation on mobile devices like smartphones. As a vendor-centric TEE, TrustZone greatly overlooks the strong protection demands and requirements from the App developers. Several security solutions have been proposed to enable the TEE-assisted isolation in the Normal World of ARM, attempting to balance the security and usability. However, they are still not full-fledged in serving Apps' needs. In this paper, we introduce LEAP, which is a lightweight App developer Centric TEE solution in the Normal World. LEAP offers the auto DevOps tool to help developers to prepare the codes running on it, enables isolated codes to execute in parallel and access peripheral (e.g. mobile GPUs) with ease, and dynamically manage system resources upon Apps' requests. We implement the LEAP prototype on the off-the-shelf ARM platform without any hardware change. We perform the comprehensive analyses and experiments to demonstrate that LEAP is efficient in design, comprehensive in support, and convenient in adoption.

Full PDF

AApp Developer Centric Trusted Execution Environment

Lizhi Sun

National Key Lab for Novel SoftwareTechnology, Nanjing UniversityNanjing, China

Shuocheng Wang

National Key Lab for Novel SoftwareTechnology, Nanjing UniversityNanjing, China

Hao Wu

National Key Lab for Novel SoftwareTechnology, Nanjing UniversityNanjing, China

Yuhang Gong

National Key Lab for Novel SoftwareTechnology, Nanjing UniversityNanjing, China

Fengyuan Xu ∗ National Key Lab for Novel SoftwareTechnology, Nanjing UniversityNanjing, China

Yunxin Liu

Microsoft ResearchBeijing, China

Hao Han

Nanjing University of Aeronauticsand AstronauticsNanjing, China

Sheng Zhong

National Key Lab for Novel SoftwareTechnology, Nanjing UniversityNanjing, China

ABSTRACT

ARM TrustZone is the de-facto hardware TEE implementation onmobile devices like smartphones. As a vendor-centric TEE, Trust-Zone greatly overlooks the strong protection demands and require-ments from the App developers. Several security solutions havebeen proposed to enable the TEE-assisted isolation in the NormalWorld of ARM, attempting to balance the security and usability.However, they are still not full-fledged in serving Apps’ needs. Inthis paper, we introduce LEAP which is a lightweight App devel-oper Centric TEE solution in the Normal World. LEAP offers theauto DevOps tool to help developers to prepare the codes runningon it, enables isolated codes to execute in parallel and access pe-ripheral (e.g. mobile GPUs) with ease, and dynamically managesystem resources upon Apps’ requests. We implement the LEAPprototype on the off-the-shelf ARM platform without any hardwarechange. We perform the comprehensive analyses and experimentsto demonstrate that LEAP is efficient in design, comprehensive insupport, and convenient in adoption.

The TrustZone technology was introduced in 2003 as the TrustedExecution Environment (TEE) design on the ARM architecture. Ithas been pervasively deployed on mobile devices and continuouslyupgraded with new features since then. Unfortunately, TrustZone,although providing stronger security guarantees than software-based security, is currently underutilized or even not popular amongapp developers due to the following reasons.TrustZone is vendor-centric rather than developer-centric. Everycode change in TrustZone must be approved ahead by vendors, andresources in TrustZone are extremely limited. Such inconvenientrestriction significantly impedes the adoption of TrustZone in theApp security. Moreover, adopting TrustZone requires substantialdevelopment efforts and TEE knowledge at the App side, especially ∗ Corresponding author. Email: [email protected]

Conference’17, July 2017, Washington, DC, USA for the existed Apps. Vulnerabilities could be otherwise created andlead to the TEE compromising [16]. Additionally, rapid App devel-opment has dramatically reshaped the mobile computing landscapesince 2003. Emerging App security demands, e.g., the mobile GPUaccess during secure execution, are not recognized or supportedthrough the evolution of TrustZone.Research works have been recently carried out to build App-friendly security solutions on top of TrustZone (shown in Figure 1).These TEE-based solutions are carefully designed to isolate theexecution of protected codes in the Normal World (NW) of ARMarchitecture, rather than in the Secure World (SW), to mitigate theresource problem. TrustICE [39] first attempts to move the APPENV (containing protected codes) out of SW. It only allows oneApp ENV to run and meanwhile freezes the whole ROS, sacrificingthe efficiency. PrivateZone [28] lifts the restriction of frozen ROSby introducing another layer of isolation in NW. OSP [17] furtherenables the parallel running of multiple APP ENVs. Unfortunately,this feature is achieved by introducing an extra TCB (Trusted Com-puting Base) hypervisor, which increases system overheads andsecurity risks. Most recently, SANCTUARY [15] leverages the newTrustZone feature to get rid of the hypervisor need and keeps theability of one-time APP ENV resource assignment like OSP. How-ever, it does not support the parallel APP ENV. More details onrelated works are provided in Section 2.3.The NW-side TEE solutions above, although balancing the se-curity and usability for TrustZone, are not fully developer-centric,according to the aforementioned problems TrustZone faces.

First ,their secure environments (APP ENVs) lacks comprehensive sup-ports for the App code execution, specifically the parallel isolatedenvironments, secured peripheral access, and dynamic resourcemanagement. Parallelism is an important strategy optimizing per-formance on the multi-core system, which is widely used by smart-phones; more and more codes that require protections containoperations of accessing peripherals, e.g., receiving a cloud-pushedpatch inside the APP ENV or exclusively access the mobile GPU fordeep learning; adapting resources upon online demands is neces-sary for parallel isolated environments to reduce resource wastingand meanwhile survive in bursty workloads.

Second , the difficulty a r X i v : . [ c s . CR ] F e b onference’17, July 2017, Washington, DC, USA Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong Figure 1: High-level design comparison between our work LEAP and its related works. ROS is the Rich OS like Android in theNormal World of ARM. APP ENV is the securely-isolated execution environment for protected codes. Boxed labeled with greentexts are key framework components in each TEE-based solution. Key feature differences are annotated in each sub-graph. of solution adoption is not considered for developers of Apps, espe-cially developers of existed Apps. Usually, it is required to manuallymodify App codes according to the targeted TEE-based solutionand calculate the resource assignment beforehand. This inconve-nience greatly de-motivates App developers to take any action onthe solution adoption.

Third , emerging developer demands areomitted. For example, many intelligent Apps have shown up, andthey deploy their deep learning (DL) models, which are often in-tellectual properties, on devices to provide timely services. ExistedNW-side TEE solutions cannot well protect executions of DL asthere is no mobile GPU support inside the APP ENV.Therefore, we propose

LEAP , a developer-centric TrustZonesolution securing critical operations of current and emerging Apps.LEAP is lightweight in design and addresses the deficiencies inexisted NW-side TEE solutions on the ARM architecture. LEAP canbalance the security strength and App usability for six developer-centric goals below: (S1) Secure Isolation.

The App sandbox (i.e., APP ENV inLEAP) must be isolated with hardware guarantee. TCB cannotbe modified after deployment. (S2) Secure Peripherals.

The codes inside App sandbox caneasily and securely access peripherals, such as the mobile GPU,NIC, and Bluetooth, without worrying about the sniffing from ROSor codes in other App sandboxes. (S3) Secure Boot.

Each App sandbox can be properly measuredfor integrity and verified for genuineness before booting. (U1) Parallel Environment.

LEAP can enforce the isolation ofmultiple parallel-running App sandboxes. (U2) Flexible Resource.

LEAP can adjust the sandbox resourceson demand in order to prevent resources from being wasted or un-derutilized. (U3) Easy Adoption.

The auto DevOps tool can be providedfor App developers to conveniently adopt LEAP to protect criticalexecutions in their Apps.LEAP introduces four developer-centric designs. First, For anexisted normal App, an DevOps tool App Adapter is introduced toautomatically convert it into a LEAP-adapted App through staticprogram analysis. After the conversion, the sensitive part of thisApp is extracted and executed in an isolated sandbox enforced byhardware. Second, multiple isolated sandboxes can run in parallelwith performance almost as same as the bare-metal case, and eachof the flights for the codes in it against threats with high privileges.LEAP’s strongly-protected App execution is general enough to be deployable on the majority existed ARM devices. Third, unlike SWApps, normal Apps often have various workloads at different time.LEAP’s resource management is able to allow sandboxes to adjusttheir computing resources, and the commonly-applied virtualiza-tion technology is not used in LEAP in order to reduce overheadson mobile devices. Last but not least, many Apps often have toaccess peripherals, such as collecting sensing data, utilizing GPUacceleration, and exchanging data via Bluetooth. LEAP’s resourcemanagement is able to ensure a peripheral already assigned to asandbox cannot be accessed by any others all the time. Our exclu-sive peripheral feature has not been supported by previous work,and its granularity is finer than the TrustZone case, where thereare only two groups/worlds.Our LEAP might also have broader impacts and shed light onfuture TrustZone hardware evolution. First, LEAP may boot thedevelopment of emerging Apps that have security issues. For exam-ple, with LEAP protection, intelligent App developer do not haveto worry about their locally-deployed large deep learning models.Moreover, LEAP achieves desired goals by utilizing a small set ofexisted ARM hardware features, which might indicate a minimalismdesign of TrustZone for the mobile scenario.In summary, our work makes the following contributions:(1) We propose a lightweight NW-side TEE LEAP which canbalance both security and usability specifically for mobileApps. Compared to existed solutions, LEAP can support par-allel isolated App execution environments featuring secureperipheral access and dynamic resource management.(2) We also close the gap for existed Apps and emerging Appsto easily enjoy LEAP’s hardware-enforced protection. Forexisted Apps, we provide the auto DevOps tool to make anApp LEAP-ready without source codes; for emerging Apps,we enable mobile GPU access and sharing inside isolatedexecution environments.(3) We implement the LEAP prototype on the off-the-shelf ARMplatform without any hardware change. We perform com-prehensive analyses and experiments to demonstrate thatLEAP is efficient in design, comprehensive in support, andconvenient in adoption.

ARM TrustZone [10] is a security extension of ARM processors.As shown in Figure 2, it divides the System-on-Chip (SoC) into pp Developer Centric Trusted Execution Environment Conference’17, July 2017, Washington, DC, USA Figure 2: ARM TrustZone & Stage-2 Address Translation. two worlds, namely Normal World (NW) and Secure World (SW),to manage CPU, memory, and peripheral devices securely. A CPUcan run in either NW or SW under the control of the

NS-bit onAXI-Bus. Secure boot [1] is used to ensure the image integrity ofthe system during the boot procedure. By configuring TrustZonePeripheral Controller (TZPC) [11], we can isolate the peripheral,that is, preventing the device from being accessed from NW. SinceARMv8.4, the TrustZone architecture has evolved with the intro-duction of virtualization extensions, i.e., SEL2, in SW. With SEL2,SW can support multiple Trusted OS in parallel. The virtualizationextension can solve TEE fragmentation to a certain extent. Thismechanism allows a high-security Trust App to run in a standaloneTrusted OS that is isolated not only from the NW but also fromother software in SW.

In ARMv8 architecture, the CPU can execute in four different ex-ception levels (EL0-EL3). Both worlds have the user space (EL0),the kernel space (EL1), and the virtualization extension (EL2). EL3(monitor mode) is used to respond to world switching. Please notethat there is typically no hypervisor running in EL2 on mobiledevices due to performance overhead. So EL2 is usually disabledduring the booting procedure [1].There are two address translation stages when the virtualizationextension is enabled. In the first stage, VM translates the virtualaddress (VA) to an intermediate physical address (IPA) based on itspage table. The second stage is called stage-2 translation , in whichthe IPA will be translated to the physical address (PA). The baseaddress of stage-2 page tables is stored in the VTTBR_EL2 register,which can only be accessed in EL2 or a higher exception level. Thehypervisor controls VMs accessing PA through managing stage-2page tables. What’s more, the second stage can not be bypassedeven if the MMU is turned off by the VM. ARM offers SMMU [14] totranslate IPA to PA for the devices, which have the Direct MemoryAccess (DMA) capability. The hypervisor can manage the pagetables for SMMU and control the memory access space to preventthe DMA attack.

NW-Side TEE Solutions.

The First kind of works devotes itselfto create TrustZone-assisted isolation in NW to improve the Trust-Zone’s usability. Figure 1 illustrates some representative worksof this type, i.e., TrustICE [39], PrivateZone [28], OSP [17], andSANCTUARY [15]. We will compare these works with our LEAPone by one. TrustICE designs an isolated computing environmentin NW without using a hypervisor. However, when the isolationenvironment is running, ROS and other isolation environments willbe frozen. In addition, TrustICE sandbox cannot adjust its resourceson-demand flexibly. PrivateZone proposes an isolation environ-ment in NW and enables security-critical code to run in the isolatedenvironment instead of running in SW. PrivateZone can only main-tain one isolation environment, so codes from different developersrun in one sandbox. The lack of isolation among different devel-opers’ codes can cause security concerns. In addition, PrivateZonealso cannot flexibly adjust resources to balance the workload, norcan it guarantee the peripheral access’s security. OSP enables vir-tualization in NW to provide SGX-like enclaves. This work useshypervisor to support the enclave’s isolation, and the hypervisorwill bring overhead when the sensitive code is running [17]. Inaddition, OSP cannot support flexible resource adjustment andsecure peripheral access. SANCTUARY aims to provide a NW iso-lation environment through TZASC [9], a hardware mechanism ofTrustZone used to control memory access permission. Comparedwith LEAP, SANCTUARY does not support secure peripheral ac-cess, flexible resource, and parallel isolation. There is another workvTZ [24], which provides a TrustZone-enhanced virtual machine(VM) design for cloud computing scenario. The goal of vTZ is toprovide each VM with a virtual TrustZone, which has several dif-ferences from the developer-centric design of LEAP. First, it cannotprovide an isolated execution environment for Apps from differentdevelopers. Second, it enables secure peripheral access between theVM and its bounded virtual TrustZone; however, for different VMsusing the same peripheral, it allows the hypervisor to share thedevice. Third, it does not provide flexible resource management forApps, and there must be a hypervisor running in EL2 to manageVMs, which will bring non-negligible performance overhead tomobile devices [35]. Our LEAP can work without a hypervisor andis therefore more suitable for mobile scenarios.

SW-Side TEE Solutions.

The second kind of works tries to im-prove the SW’s usability and security. Work [36] slices the security-critical part of an App’s through annotating the sensitive data insource code and ports the sliced part into SW. TrustShadow [21]explores how to run legacy Apps in SW. It introduces a runtime tohelp legacy App run in SW without any modification. secTEE [44]proposes an Enclave-like design in SW to isolate the security-criticalservices from other SW softwares. TEEv [33] and PrOS [32] intro-duce the virtualization technology to the SW through the software-based isolation. However, these works import the third-party ex-ecutable code into SW and enlarge the TCB. A larger TCB is in-herently more vulnerable to compromise, and the code importedby a third-party developer may exacerbate the security issues. OurLEAP has a tamper-resisted TCB. After development, no executablecode will be added to the SW. onference’17, July 2017, Washington, DC, USA Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong Figure 3: LEAP System Overview. The green components areLEAP parts.

In this section, we first introduce all system components of LEAP,including their roles and functions. We then illustrate how thesecomponents interact with each other, a.k.a. the LEAP workflow,throughout the life-cycle of a LEAP sandbox. In the end, we brieflyhighlight the key designs, which are elaborated with more detailsin the following section.

Before diving into LEAP design, we first explain our security model.We consider the scenario of protecting the execution of sensitiveApp codes on the ARM platform with hardware security enforce-ment. Sensitive codes (i.e., security-critical codes) have variousactivities, such as accessing peripherals and adjusting resources,and contain valuable App assets like closed-source deep learningmodels.We assume the Rich OS in NW (ROS) could be malicious orcompromised by the adversary. The goal of the adversary is tocompromise the execution integrity or access the App assets underprotection. We assume the driver used by developer for peripheralaccess is benign and bug free. We also assume some sensitive codesrequiring our protection are curious about the execution of othersensitive codes. For example, they may try finding out what sensingdata others collect.We only trust the low-level features of the ARM architecture, in-cluding the secure boot, TrustZone, and stage-2 translation. Similarto previous works [24], we do not consider physical attacks likethe cold boot [22] and the bus monitoring attacks [25, 31], Deny-of-Service (DoS) attack, and cache side channel attacks [19, 20, 34, 43]in this work.

Figure 3 illustrates the high-level design of LEAP. LEAP consists offour components, i.e., LEAP

ROS , LEAP

SOS , LEAP SW , and LEAP ATF .They are software-based and leverages existed ARM hardware fea-tures so that LEAP can be easily deployed on existed mobile devices.

ROS is the legacy OS running in the NW, e.g., the Android. AnApp adapting LEAP is called pAPP , and its sensitive codes un-der LEAP protection is called sc-pAPP . The

LEAP sandbox is asensitive-code execution environment protecting the sc-pAPP andLEAP

SOS running inside it. The sc-pAPP is allowed to exclusively access peripherals when needed. Multiple LEAP sandboxes can runin parallel beside ROS with minimal performance influence.

LEAP

ROS is a ROS kernel module. It loads images, i.e., sc-pAPPand LEAP

SOS , maintains metadata, and pre-allocates resources forLEAP sandbox. A pAPP can create and interact with its LEAP sand-box via LEAP

ROS . Before LEAP sandbox switching peripherals oradjusting resources, LEAP

ROS prepares the hardware configurationinformation required by LEAP SW . LEAP

SOS is a tiny kernel we tailored from Linux. It is used to pro-vide a minimal runtime inside the LEAP sandbox for a sc-pAPP,named sandbox OS (SOS). LEAP SW interacts with LEAP ROS on be-half of sc-pAPP for resource management. LEAP SW also leveragesthe rich Linux driver ecosystem to serve various peripheral accessneeds from sc-pAPP. LEAP SW is a kernel module in TOS installed by the device vendor.Note it is a part of TCB and tamper-resist. LEAP SW is responsiblefor key storage and checking the integrity of the LEAP sandboximage before launching it. LEAP

ATF is a patch to the vanilla ARM Trusted Firmware. It alsobelongs to our tamper-resist TCB. LEAP

ATF enforces LEAP sandboxisolation and exclusive peripheral access, manages resources thatpre-allocated by LEAP

ROS , and launches LEAP sandbox.Except for system components, LEAP also provides an automaticDevOps tool for App developers. This tool, which is called

AppAdapter , can make the App adaption of LEAP transparent to its de-veloper, which require no source code access and extra developmentefforts. More details are in Section 4.1.

This part introduces the workflow of LEAP throughout the life-cycle of a sandbox. We describe how to create, initialize, and ter-minate a LEAP sandbox LEAP

SOS , and how the LEAP

SOS accessesperipherals exclusively and adjusts resources.

Creation.

A LEAP-adapted App can be created directly from scratchby a developer or converted from an existed App with the assistanceof our DevOps tool. In the converting case, our tool first transformsthe App into two parts, the NW part pAPP, and the security-criticalpart sc-pAPP, with a clean and neat interface between them. Next,it packs the sc-pAPP and LEAP

SOS

OS together as an encryptedimage and signs it on behalf of the developer. When installing theLEAP-adapted App, this signature is securely stored by LEAP SW for verification purpose in the initialization stage. Initialization.

The LEAP

SOS initialization is triggered when thepAPP calls its sc-pAPP counterpart. Once LEAP

ROS takes upon thepAPP’s request, it pre-allocates resources, such as CPU cores andmemory, for this LEAP

SOS . Next, LEAP

ROS loads the encryptedpacked image, which is prepared in the creation stage, into theallocated memory and notifies LEAP

ATF to lock the resources. Then,LEAP

ATF asks LEAP SW to verify the integrity. If the verificationis passed, LEAP SW will decrypt it as well. LEAP ATF then securelylaunches it. sc-pAPP will respond to pAPP ’s request after booting.Attestation can also be performed during runtime in a similar waylike previous works [28, 44].

Peripheral Access.

ROS holds all peripheral resources by default.When a sc-pAPP is willing to access one peripheral, LEAP

SOS pp Developer Centric Trusted Execution Environment Conference’17, July 2017, Washington, DC, USA makes a request to LEAP ROS . LEAP

ROS checks the status of mod-ules loaded in the kernel, if the peripheral is available, LEAP

ROS informs LEAP

ATF to unmap it from ROS and map it to the corre-sponding LEAP

SOS via the stage-2 page table. Next, LEAP

SOS loadsthe device driver, and the sc-pAPP in it can then use the peripheral.Note that this peripheral cannot be accessed by other LEAP

SOS andROS until it is released from currently-engaged LEAP

SOS . Resource Adjustment.

LEAP

SOS is able to request and release re-sources, e.g., CPU cores and memory, on demand for the sake ofefficiency and elasticity. When LEAP

SOS requests more resources,LEAP

ROS will prepare the resources and notify LEAP

ATF to checkwhether these resources are secure to be used. Upon the checkpasses, LEAP

ATF will assign these resources to requested LEAP

SOS and enforce the resource isolation. When releasing resources, LEAP

SOS notifies LEAP

ATF to securely return resources to ROS.

Termination.

When sc-pAPP completes its tasks, its LEAP

SOS in-forms LEAP

ROS of its termination and asks LEAP

ATF to shutdownit. LEAP

ROS then asks LEAP

ATF to release all resources of the ter-minated LEAP

SOS . Released resources are in the end returned backto ROS.

In this part, we present at a high level several key designs applied inLEAP and principles behind them. These designs are driven by theApp developer’s needs. Additionally, they practice the minimalismdesign principle and could be an alternative to the current Trust-Zone hardware evolution, which is more and more complicated.

Automatic App Adapter . The tedious DevOps experience isone of the key reasons why TrustZone and TEE-based solutions arenot popular among App developers. Furthermore, many developersmay not be familiar with the system programming. Thus, we intro-duce an auto DevOps tool to transform an App, even without sourcecodes, into a LEAP-ready App. Technical details are explained inthe next section.

Isolated Parallel Execution . Unlike the vendor-centric work-load, it is necessary to enable the multiple App sandboxes runningin parallel. At the same time, we want to keep the codes in SW,which is part of our TCB, minimal and fixed without change. There-fore, the attack surface can be reduced. Additionally, we rely on thehardware security features to fight against high-privileged threats.Thus, the attack surface is reduced. Technical details are explainedin the next section.

Exclusive Peripheral Management . We design a lightweightmechanism to guarantee that a sc-pAPP can access rich peripheralsexclusively. Moreover, App developers do not have to worry aboutthe availability of peripheral drivers. Only a few device drivers,by contrast, have been implemented for OSes in SW due to hugedevelopment efforts. Technical details are explained in the nextsection.

Flexible Resources Adjustment . The computing resources,i.e., CPU cores and memory, inside a LEAP sandbox can be dynami-cally adjusted upon requests from the corresponding App in NW. Itis challenging to be achieved given that we get rid of the resourcevirtualization to gain IO efficiency inside the App sandbox.

Figure 4: The processing pipeline of the App adapter.

The App developer-centric design realized by LEAP primarily hasfour techniques, the automatic App adapter used offline for theApp preparation, the isolated parallel execution used online afterthe App installation, the exclusive peripheral management used forsecure IO and flexible resources adjustment used for resource allo-cation during runtime. Our isolated parallel execution is achievedby only leveraging a small set of existed ARM hardware features- the stage-2 translation, ARM monitor mode, and SEL1 (EL1 inSW) - so that this design can be easily applicable to existed ARMdevices, building a foundation of evolving TrustZone toward adeveloper-centric TEE.

This App adapter is designed to minimize the development effortswhen applying the LEAP protection on an existed App. The au-tomation it offers greatly eliminates the adaption cost concern ofnon-expert developers. It is intended to demonstrate why the De-vOps should be considered in the developer-centric TEE, so it doesnot cover all DevOps demands. We plan to make it more completein the future.Figure 4 illustrates the processing pipeline of the App adapter. Toconverting an App, its developer only has to prepare a configurationfile pointing out the entry points of the sensitive codes. For example,if a developer wants to protect the valuable deep learning modelwith corresponding inference code in her App, she just lists the APIstriggering the inference task in the configuration file for our Appadapter. In such file, entry points are listed line by line in the formatof < the class of the function definition: the function prototype > . Thenour App adapter primarily performs two tasks. The first task isto extract the indicated sensitive code, i.e. sc-pAPP, out from thetargeted regular App, while the second task is to repack sc-pAPPfor running in the LEAP sandbox.More concretely, the AppSli module performs calls graph analysisand data flow analysis on the App and extracts the security-criticalpart, i.e., all code called by entry points.

LibGen generates a dy-namic linking library responsible for the communication betweenthe normal part and the security-critical part according to entry onference’17, July 2017, Washington, DC, USA Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong points in the configuration file and the sliced codes. Next, the gen-erated library and the App’s normal part are repacked as a pAPP torun on ROS. Therefore, all runtime communications between thenormal and the security-critical parts will be forwarded throughthe generated dynamic linking library. As to the security-criticalpart, the App adapter compiles it into an executable java program,packs the java program with a LEAP SOS , and encrypts it to producean LEAP sandbox image. The encrypted image will be signed forthe integrity verification during secure boot. The signature and thedecryption key of the encrypted image will be stored in LEAP SW as the whole App is installed onto the user device. The encryptedimage will be stored on the disk. We provide more technical detailsabout AppSli and

LibGen as follows. . The

𝐴𝑝𝑝𝑆𝑙𝑖 module is built upon the javaoptimization framework, Soot [41]. Soot is suited for performingvarious static analyses and instruments on Android Apps. We firstdecompile the App and locate all targeted entry points. We thenbuild call graphs of the App and traverse all reachable codes fromthese entry points. We also perform the backward data-flow analysisto maintain the dependency of traversed codes. For example, if adeveloper-defined object type is used in the traversed code, weneed to maintain a copy of the class definition in the traversedcode. By iteratively performing backward data-flow analyses, allsecurity-critical code can be found and ready for repackaging. . The

𝐿𝑖𝑏𝐺𝑒𝑛 module is used to producea dynamic linking library, i.e., a communication proxy, which isintegrated with the normal part of an App and connects with the cor-responding security-critical part. In this library, one component isthe code to create and initialize the LEAP sandbox. The sandbox cre-ation functions first notify the LEAP

ROS to prepare one CPU-coreand default 256M memory to launch the sandbox. Then LEAP SW verifies the integrity of the prepared image containing the sc-pAPPbefore booting. The other component is to generate all new entrypoints for passing parameters between the pAPP and sc-pAPP, withthe rely on LEAP ROS . We present an example in Figure 4. For the en-try point provided by the de-veloper, LibGen generates a function . This generated func-tion can pass the input data to sc-pAPP through the APIs providedby LEAP

ROS . When packing the pAPP, all calls to entry points ofthe original APP will be replaced with calls to generated ones.

It is not intuitive to design an isolated parallel environment, es-pecially given efficiency and security. Figure 5 illustrates somecurrent works who can (or can be extended to) support parallel iso-lation, but they have deficiencies in terms of efficiency and security.A virtualization-based design in NW requires a hypervisor thatbrings system overhead to the mobile device when sc-App is run-ning [17, 24]. SANCTUARY [15] is a TZASC-based solution. It uti-lizes the hardware features of the latest TZASC, e.g., TZC-400 [12],to ensure the CPU and memory isolation. However, SANCTUARYcannot run sandboxes in parallel. Even if we improve SANCTU-ARY, the parallelism of this solution is also limited by the hardware. Because TZC-400 can only reserve a limited number of memoryregions. Note that there are a number of differences between LEAPand these current works. We only discuss the differences in terms ofparallel isolation here. The comprehensive comparison is presentedin Section 2.3.Figure 5 shows the LEAP’s design of parallel isolated execution.LEAP provides a stage-2 page table based isolation for sandboxes.Each sandbox has exclusive resources, e.g., CPU cores and mem-ory regions, which shares a similar idea with NoHype [30]. Theexclusive resources can only be accessed by the software in thecorresponding sandbox. Next, we will show how we design parallelisolation in terms of the differences between LEAP and existingworks.

LEAP applies the stage-2 translation to ensurethe isolation. However, unlike the virtualization-based solutions, wedo not introduce a hypervisor to virtualize any resources, i.e., CPUor memory. Enabling the virtualization may incur non-negligibleperformance overhead [35], which is prohibitive for the resources-limited mobile devices. LEAP enables and manages the stage-2page table by LEAP

ATF in EL3, which is a patch of ATF shownin Figure 3. To enable the stage-2 translation without hypervisor,the LEAP

ATF lets the CPU go back to EL1 when it returns fromEL3, instead of letting the CPU go back to EL1 via EL2. Beforebooting ROS, LEAP

ATF reserves a memory region to store the stage-2 page tables. And this region is reserved from ROS and sandboxOS. LEAP

ATF controls ROS, and any sandbox can only access itsown resources through masking the corresponding stage-2 pagetable entries. Moreover, all resources and peripheral access arealso controlled by these page tables, which will be introduced inSection 4.3.

The SW-based virtualization solutionslike TEEv need to install software to SW. The installation will in-crease TCB size, which inherently brings security risks, not tomention that the installed codes come from third parties. LEAPhas a tamper-resisted TCB. That is LEAP does not need to loadany executable code into TCB after the LEAP’s deployment. Weachieve these by following "least privilege principle". Specifically,we only put the isolated environment isolation and resources ac-cess management into LEAP

ATF and put the services for secureboot into the LEAP SW , i.e., key storage, encryption/decryption, andhashing. All services provided by developers will be constrainedin LEAP sandbox through hardware-enhancement. The small andtamper-resisted TCB makes our design safer than the SW-basedvirtualization works. Besides the tamper-resisted TCB, we fur-ther enhance the system security by proposing a cache protectionmechanism. NW-based memory isolation solutions are vulnerableto the cache direct attack. For example, if an attacker accesses thememory region which is prepared for one sandbox, the memorycontent will be cached in the cache line. As L2 cache is usuallyshared by one cluster, the attacker can directly read the memorycontent from the cache line. SANCTUARY defenses this attack byseeking for a hardware change, which is not available for currenthardware, or simply disabling the L2 cache, which decreases system pp Developer Centric Trusted Execution Environment Conference’17, July 2017, Washington, DC, USA Figure 5: Current solutions to support isolated executions. The sc-APP is the security-critical (part of an) APP. Thevirtualization-based in Normal World design supports parallel isolated sandboxes through the hypervisor based on virtualiza-tion, e.g., OSP [17], vTZ [24]. SANCTUARY [15] is a TZASC-based solution. It cannot support running multiple sandboxes inparallel. The last one is the high-level design of our methods. (A detailed design of LEAP can be found in Figure 3.) We detailthe difference of LEAP from these works in Section 2.3. performance. LEAP solves this through proposing a cache saniti-zation technique. Since the cache is usually physically indexed onARMv8 [13], LEAP prevents the attacker from successfully translatethe virtual address to the physical address, which can be managedby unmapping stage-2 page table entries. However, the stage-2translation entries may be cached in TLB. Therefore, before boot-ing a sandbox or adjusting a memory region to LEAP sandbox,LEAP

ATF clears these TLB entries that map to the new preparedmemory space, which has almost no impact on system performancein practical use.

We design an integrity verification mechanismto ensure the secure boot of LEAP sandboxes ( S3 ). LEAP will starta sandbox once pAPP requests sc-pAPP’s services. Before launch-ing the sandbox, LEAP will verify the integrity of the encryptedruntime image, which contains LEAP SOS code and sc-pAPP. Thesignature of the runtime image is produced in the creation stage andsecurely stored by LEAP SW . LEAP SW is responsible for performingthe integrity verification for the image. Only when the verificationpasses, the LEAP ATF will boot the LEAP sandbox.

It is non-trivial to design a peripheralmanagement mechanism when considering IO security and usabil-ity, e.g., develop effort and efficiency. We show design challengesby proposing two straw-man solutions (Figure 6) and explain whythey fail to meet the peripheral management requirements.

Straw-man Solution 1.

The first possible design is to redirectall peripheral IO to the Secure World and leverage the hardware-assisted Secure IO. As shown in Figure 6a, all devices are mappedinto the Secure World, and all device management modules, e.g.,device drivers, are installed into the Trusted OS. This seeminglysimple solution has two serious design flaws.

Usability.

All peripheral drivers needed by App developers shouldbe installed into Secure World in this design. It is at least a hardtask, if not an impossible task. Porting or implementing a driver forspecial Trusted OSs like OP-TEE [8] is difficult and time-consumingeven if the corresponding driver for ROSs like Android is open-source. In reality, peripheral drivers are often very complicatedand closed-source, rendering the Secure-World driver porting ordeveloping impossible. Additionally, the system programming effort for arbitrary IO redirecting is heavy as well, given there are somany types of peripheral driver implementations. Therefore, thispossible design puts too much burden on the shoulder of applicationdevelopers.

Security.

Another important reason is that adding suchdrivers to the TOS will enlarge the TCB.

Straw-man Solution 2.

The second possible design is to introducea driver monitor module, shown in Figure 6b, to ensure there isonly one Normal-World driver enabled for a device at a time. Whena driver wants to use a device, it should make a request to the drivermonitor. After being allowed, it will be enabled and access therequested devices. The driver monitor keeps scanning the normalworld to detect if any driver works illicitly.This design also has two problems.

Usability.

It is very costlyto scan the kernel memory to detect if any driver works. As re-ported in DeepMem [38], recognizing a kernel object takes about13 seconds even in a PC environment, whose computation abilityis more powerful than the mobile devices. The overhead of sucha design is not acceptable.

Security.

The Normal World OS, e.g.,ROS and LEAP

SOS might access the peripheral through directlyreading/writing a specific IO address without using driver. That is,any device access without drivers will bypass the driver monitorand fail this method.

Our design follows three principles. (1) Devel-opers should be able to directly reuse all off-the-shelf drivers of theLinux ecosystem in LEAP sandbox, as in the Normal World. Thatis, sc-pAPP can access any peripheral from the sandbox. (2) Theperipheral access should be lightweight and efficient. The overheadof peripheral access from the sandbox should not be greater thanthat from ROS. (3) Only one Normal-World execution, i.e., ROS ora sandbox can access a peripheral at a time. If one device can beaccessed from different executions at the same time, the IO securitywill be compromised.Figure 6c illustrates our peripheral management mechanism. Ourdesign abides all above three principles through manipulating thestage-2 tables. The stage-2 page tables are normally used to enforcememory isolation. However, the key observation of our design isthat ARM adopts Memory Mapped IO (MMIO), which provides uswith the opportunity to control IO access through managing stage-2 page tables. Recall there may be multiple sandboxes and ROSparallelly run in LEAP on different cores, LEAP

ATF sets different onference’17, July 2017, Washington, DC, USA Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong (a) Straw-man Solution 1. All peripheral ac-cesses are forwarded to the Secure World. (b) Straw-man Solution 2. Secure world isresponsible for ensuring the system so thatonly one Normal World driver can accessthe device at a time access. (c) Our Solution. Exclusive peripheral management de-signed by LEAP. The core idea is using stage-2 page tablesto ensure only one Normal-World driver can access certainperipheral at a time. Figure 6: Straw-man Solutions of Peripheral Management and Our Solution. Note only our solution can meet the security andusability requirements in our scenario. stage-2 page tables for each of them. When in use, the LEAP sandboxcan request LEAP

ATF to use some device. LEAP

ATF will assign thedevice to the requester by modifying its stage-2 page tables on thefly. If the requested peripheral has been occupied by execution,the requester has to wait until the device is available before it cangain access permission to it. When a sandbox uses a device, allother sandboxes’ and ROS’s page table entries of this device willbe marked as invalid to ensure exclusive access.The stage-2 page tables to control the peripheral access are storedin a block of physical memory reserved by LEAP

ATF . This memoryregion is never mapped to ROS or LEAP

SOS to prevent them fromaccessing it. The stage-2 page table takes 2MB and 4KB mappingfor memory space and IO space respectively. The page tables ofeach execution only use less than a 2MB memory region to addressand use peripherals. In our prototype system, there are 8 CPU cores.So the reserved memory region is only 16M, which can supportROS and at most 7 sandboxes to run in parallel.

Dynamic CPU cores and memory adjustment can effectively balancethe system workload and improve the system resources’ utilization.We detail the resources management in two parts, i.e., the dynamicmemory adjustment and CPU cores adjustment.

LEAP proposes two mecha-nisms, i.e., zero consumption policy and memory pool sharing, toallocate LEAP memory. The zero consumption policy is responsiblefor the preparation and adjustment of LEAP

SOS memory on thefly. The memory pool sharing is used to manage the share mem-ory between ROS and LEAP sandboxes. ROS communicates withLEAP sandbox through the share memory. Figure 7 illustrates thesememory management schemes.When booting a new sandbox, ROS will pre-allocate a memoryregion with the default size. Then the LEAP

ATF will sanitize thepre-allocated memory, assign the memory to the correspondingsandbox. When the LEAP sandbox needs to increase/decrease itsmemory size, the LEAP

ATF will perform the adjustment followingthe zero consumption policy . The zero consumption policy ensuresthat the sandbox’s memory region is always physically continu-ous, no matter how many adjustments are performed. Ensuring

Figure 7: LEAP sandbox memory layout and dynamic mem-ory scheme design. the physical continuity of the memory region can reduce systemmaintenance costs and the complexity of TCB. A trivial method toensure continuous memory is to reserve a large block of memoryfor the sandbox. But the reserved memory cannot be used by ROS,which wastes system resources when there is no sandbox running.We apply the Linux Contiguous Memory Allocator [3] (CMA) tech-nology to manage the memory for LEAP sandbox. It can allocatethe memory sections physically adjacent to the sandbox, which isused by ROS by default, into the sandbox.The memory pool sharing maintains all communication channels,which is implemented through share memory, between ROS andLEAP sandboxes located in the same continuous memory region.LEAP

ROS continuously allocates a new share memory region fromthis pool when booting a new LEAP sandbox. Each sandbox has anexclusive communication channel. To prevent the LEAP sandboxfrom accessing others’s communication channels, LEAP

ATF willnot map others’ communication channels to this sandbox with theaccess control guaranteed by the stage-2 page table.

A LEAP sandbox is assigned witha CPU core by default at startup. However, LEAP sandbox is allowedto request more cores from ROS on-demand during running. Thisdynamic CPU allocation design can achieve a good system workloadbalance. When adjusting the CPU cores, the sc-pAPP can request abig core or little core according to its need to optimize the overallexecution and energy consumption.The adjustment procedure is as follows. The sc-pAPP issues arequest to adjust its CPU core. LEAP

ROS checks whether there is anavailable core. If any core is available, LEAP

ROS notifies LEAP

ATF to remove the core from ROS through Linux CPU hot-plug [4] pp Developer Centric Trusted Execution Environment Conference’17, July 2017, Washington, DC, USA technology. LEAP ATF clears the core’s cache to prevent data leakageand securely shutdown the core. In the end, LEAP

SOS requestsLEAP

ATF for the core and LEAP

ATF initialize the core with thecorrect context and boots the core for LEAP

SOS .The sc-LEAP can give up the surplus cores occupied by it througha similar procedure. Also, ROS can request LEAP

SOS for core adjust-ment. If LEAP

SOS finds there is free core, it can return the core backto ROS. LEAP

SOS always holds at least one core so that sc-LEAPcan execute, and LEAP

ATF ensures data security when switchingcores between ROS and sandbox.

In this section, we discuss how LEAP defends against possibleattacks under our security model (See Section 3.1). Since LEAPprovides hardware-assisted isolation for applications, the maliciousapplication codes, no matter they are in the ROS user space or ina LEAP sandbox, cannot access data or compromise executions inanother LEAP sandbox. Therefore, our security analysis mainlyfocuses on a compromised ROS.

Malicious LEAP

ROS

Manipulation . A compromised ROS can ma-nipulate the LEAP

ROS installed by LEAP. The LEAP

ROS is respon-sible for preparing the sandbox image and pre-allocating the re-sources. So malicious manipulations lie in the sandbox creation andresources management. When creating a new sandbox, the com-promised LEAP

ROS may prepare malicious LEAP

SOS and sc-pAPPimages to compromise secure services. LEAP copes with this attackwith a Secure Boot mechanism (See Section 3), which can ensurethe LEAP sandbox images’ integrity before launching the image.The malicious ROS can also misconfigure resources at resourcesadjustment. To be specific, when a sandbox increases its memory,ROS can maliciously prepare a memory region for the requesterthat has already been used by another sandbox. LEAP solves thiskind of attack by checking the configurations’ legitimacy throughLEAP

ATF (See Section 4.4). Similarly, LEAP

ATF also ensures thata compromised ROS cannot allocate a CPU core that has alreadybeen used by a sandbox to another one through verification whencreating sandboxes or adjusting CPU cores.

Peripheral IO Eavesdropping . The compromised ROS cannotsuccessfully access the IO addresses of a peripheral occupied bya LEAP sandbox. It is because these addresses are blocked in thestage-2 address translation, which is controlled by the LEAP

ATF .At the same time, the IO address translation for this device is alsoblocked for other LEAP sandboxes. Thus, one LEAP sandbox can-not successfully perform IO Eavesdropping to other sandboxes,either. As to some devices capable of DMA, LEAP, except usingthe same method to block peripheral DMA, replies on the ARM’sSMMU [14] to prevent the bypassing of the main memory access.Thus, a compromised ROS cannot eavesdrop on the data in a pe-ripheral occupied by a LEAP sandbox.

Cache Direct Attack . A compromised ROS may access the mem-ory region to be allocated to the sandbox to cache it in L2 cache.After the memory adjustment, the compromised ROS tries to accessthe sandbox’s memory space through L2 cache. LEAP proposes acache sanitization technique (See Section 4.2) to defend this kindof attack by clearing the CPU cores’ TLB entries related to the newly-allocated memory. For different LEAP sandboxes, the mem-ory space that belongs to one sandbox is never mapped to othersandboxes. So one sandbox cannot directly access the address spaceof another sandbox, nor can it read the memory space of anothersandbox through the cache, because it can never successfully trans-late the address space belongs to others to a valid physical addresswhich is required by L2 cache indexing.

In this section, we describe the experimental setup, followed bya comprehensive evaluation of LEAP by answering the followingthree questions:(1) How much performance improvement will be brought bythe parallel isolation design?(2) How does the flexible resources design help the sandbox tobalance the workload?(3) How does our exclusive peripheral design perform whenaccessing peripherals?Last, the case study of a real-world GPU-accelerated machine learn-ing application demonstrates how easily and efficiently an applica-tion can run in LEAP.

Hardware.

We implemented a prototype of LEAP on Hikey960,which is a widely-used development board with the same SoC asmany COTS smartphones such as Huawei P10. The board equipswith eight cores (4 Cortex-A53 + 4 Cortex-A73) with big.LITTLEarchitecture, a 4GB physical memory of which 3.5 to 4GB addressspace is used for peripheral I/O address space, a Mali-G71 GPU, aBluetooth 4.1, and a WiFi module.

Software.

Android 9.0.0_r31 (kernel version 4.14) and a popularopen-source Trusted OS OP-TEE (v3.4.0) [8] were chosen as theLEAP’s ROS and TOS, respectively. We used the standard ARMTrusted Firmware patched with LEAP

ATF in EL3. The whole LEAPsystem has 4,689 lines of code (LOC), including LEAP

ATF (539 LOC),LEAP SW (651 LOC), LEAP ROS (1,327 LOC), LEAP

SOS (972 LOC),and DevOps (1,200). LEAP

SOS , the LEAP sandbox’s OS, utilized acustomized Linux kernel (v3.13) whose size is only about 12MB. Inorder to further reduce the booting time of LEAP

SOS , we put extraengineering effort. First, the initialization code of GIC was removedsince there is no need for a sandbox to initialize GIC, which hasalready been done by the ROS. Second, the code of setting systemclocks was removed to prevent the sandbox from resetting thesystem clocks when switching devices. Last, the in-memory filesystem ramfs was used in LEAP

SOS . The sandbox’s file operationsare assisted by LEAP

ROS . Methodology.

We compared our LEAP to a basic TrustZone-based TEE and a SANCTUARY [15] prototype. Because the formeris a native implementation on existing devices, and the latter isthe current state-of-the-art work. The software environment, i.e.,ROS, TOS, and ATF, of the three prototypes are exactly the same.Since SANCTUARY is not open-sourced, we reproduced it follow-ing the paper [15] with one modification for a fair comparison.SANCTUARY uses a micro-kernel OS in its sandbox, while thereproduced SANCTUARY prototype uses the same sandbox OSas our LEAP. Note that this modification will not hurt its design onference’17, July 2017, Washington, DC, USA Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong Figure 8: The total decryption time with different data sizesfor 3 Apps, and smaller is better. LEAP-1Ins., LEAP-2Ins.,LEAP-3Ins. represents there are 1, 2, and 3 sandbox(es) in-stanced, respectively. idea. All three prototypes used the same hardware settings. SinceAndroid automatically adjusts the CPU frequency according to thesystem workload and CPU temperature, which makes the CPUperformance varies. We set the CPU frequency to a fixed maxi-mum frequency, and we let the CPU cool down between everyexperiment.

We implemented a decryption App based on the mbedtls [6] to mea-sure the performance of the parallel execution of LEAP sandboxes.The App takes an encrypted data stream as input and decryptsthe data securely using its secret key. The encryption spec was setto AES-256-CBC with a 64KB block size. We measured the totaltime cost when the decryption App was requested by 3 other Appsfor data decryption. In TrustZone and SANCTUARY, there is noparallel environment, so the decryption has to be performed inturn without parallel. In LEAP, we instanced one, two, and threesandboxes respectively to test such an operation.The aforementioned experiment was performed with differentdata size, and the results are shown in Figure 8. Due to the paralleldesign, LEAP performs better than TrustZone from 7.52 × to 7.54 × and better than SANCTUARY from 2.95 × to 3.71 × when there are3 sandbox instances (LEAP-3Ins.). Also, LEAP-3Ins. performs about1.72 × to 2.24 × better than LEAP-2Ins. When only one sandbox isinstanced, i.e., LEAP-1Ins, it gets a similar performance with SANC-TUARY since the tasks are also executed in turn. This indicates thatour stage-2 translation based isolation incurs negligible overhead.We also notice that TrustZone performs much worse than LEAPand SANCTUARY as the file size increases. This is mainly becauseOP-TEE does not have its own scheduler, and it is scheduled byROS instead, leading to the result that the longer time one TA runs,the more CPU context switches are performed between two worlds. To demonstrate the benefits of flexible resources, we implementedthe following two Apps and evaluated their performance with dif-ferent workloads. The results conclude that our flexible resourcedesign enables a high application performance as well as high re-source utilization.

Figure 9: Data signing application performance with the fre-quency of 8 requests/s. The request queue length at everysecond is recorded and smaller is better.Table 1: The execution time and resource utilization ratewith different memory allocation strategies. The resourceutilization rate is calculated based on the actual used re-sources accounted for the allocated resources during thequery App execution.Memory Size (MB) Time(s) Resource utilization

30 35.50 98.98%50 34.43 96.37%60 27.76 93.43%80 22.58 85.52%100 17.63 70.39%

Data signing App.

We implemented a data signing App, which ac-cepts a 4KB data stream and performs the SHA-256 digest algorithmfor the data, and finally signs the digest with an RSA-2048 privatekey. The data signing App runs two threads that share a ring queueto handle requests. One thread is responsible for enqueueing therequests, the other thread is responsible for processing the requestsin the queue, and it dequeues a request when one signing task iscompleted.We deployed data signing App to three prototypes. In TrustZone,because OP-TEE does not provide multi-threading to TA, we had toimplement the request queue in NW. In LEAP, we added one modi-fication, that is, when the queue length exceeds 5, the processingthread will request to temporarily increase the CPU core to helpprocess the request and returns the CPU core back to ROS whenthe queue length is 0. This modification is beneficial from the LEAPflexible resource adjustment design.We evaluated the processing efficiency of the data signing Appin two ways. First, we set the request frequency to 8 requests/s andobserved the queue length at every second. Second, we changed therequest frequency, ranging from 6 requests/s to 10 requests/s, andobserved the queue length at the fifth second for each frequency.The experimental results are shown in Figure 9 and Figure 10,it is seen that LEAP always has a good performance under differ-ent workloads. In figure 9, when the request frequency is set to 8requests/s, the processing speed of TrustZone and SANCTUARYcannot meet the request speed, so the queue length grows everysecond. For LEAP, the queue length also increases in the first two pp Developer Centric Trusted Execution Environment Conference’17, July 2017, Washington, DC, USA Figure 10: Data signing App performance with different re-quest frequency. The request queue length at the fifth sec-ond is presented and smaller is better. seconds, but the queue length starts to decrease from the thirdsecond. This is because it dynamically requests one CPU core fromROS to help process the request. In Figure 10, when the requestfrequency is low (6 requests/s), each new request can be processedin time by SANCTUARY and LEAP, so the queue length is 0 at thefifth second. However, TrustZone cannot fully catch up with therequest speed and the queue length is 8. This is also because ofthe context switch overhead. As the request frequency increasing,the processing speed of SANCTUARY also begins to fail to keepup with the request speed, and the length of both TrustZone andSANCTUARY increases quickly. However, LEAP always keeps alow and stable queue length even in the face of high request fre-quency. The result indicates that LEAP enables sc-pAPP’ to adjustits computing resources in time under different workloads.

Encrypted data query App.

The second security App we testedis a ciphertext query App that accepts the key provided by a useras a query keyword, performs the query in the encrypted file withkey-value data, and returns the results to the user. To speed upthe query procedure, the query App caches the decrypted data inthe memory. Only when the key is not found in the memory, itloads the encrypted file from the disk and caches it in the memory.The encryption method we chose is the same as the secure storageencryption method provided by OP-TEE, which uses AES-128-CBCto encrypt files, and the size of each encrypted block is 256 bytes.We generated 10 encrypted files contains different key-valuepairs with different sizes, ranging from 10MB to 100MB, and we alsorandomly generate 10 query sequences for each file. We measuredthe time to complete 10 queries for each file and record the totaltime cost to complete all queries for 10 files. In OP-TEE, the queryApp was based on its secure storage TA, and we used 10MB memorysize as the file cache space, as the total available memory spacefor TAs does not exceed 16 MB. In both SANCTUARY and LEAP,we set memory size to 10MB. However, the query App on LEAPcan dynamically adjust its memory size, which is set at a 16MBgranularity to handle files with different sizes.It took about 19.24 seconds to finish all queries for LEAP andthe time for TrustZone and SANCTUARY was 686.24 seconds and61.64 seconds respectively. LEAP performs about 35.67 × and 3.20 × faster than TrustZone and SANCTUARY respectively. TrustZone is Figure 11: iPerf networking benchmark results. much slower because it relies on the ROS driver to load files, whichincurs frequent context switches when loading large files. LEAPis faster than SANCTUARY since the query App can dynamicallychange its memory size when handling files with different sizes.Although it is possible to make SANCTUARY allocate a largememory size in advance to improve efficiency, however, this willgreatly waste resources because most of these memory are not usedmost of the time. We measured the SANCTUARY performance withdifferent memory allocation size and compared its efficiency andresource utilization with LEAP. The result is shown in Table 1.As Table 1 shows, when SANCTUARY increases the preallocatedmemory size, it indeed improves efficiency. However, the resourceutilization also decreases. The resource utilization rate of LEAP is92.13%. Compared with SANCTUARY, LEAP is faster than SANC-TUARY by 1.44 × in the case of similar resource utilization rate(93.43%). When SANCTUARY and LEAP have similar performance,SANCTUARY’s resource utilization rate is lower than LEAP by21.74%. More importantly, when security-critical Apps need to han-dle a variety of workloads, it is difficult to choose an appropriateresource allocation strategy in advance to balance resource usageand application performance well.To clearly know our flexible resource overhead, we further profilethe resource adjustment cost, and it proves that resource adjustmentcan be performed efficiently. Specifically, it takes about 199ms and137ms to add a big or little core, and the time to remove a bigor little core is 92ms and 72ms, respectively. And it only takes54ms for sc-pAPP to add a block of memory, and the operation torelease a block of memory can be completed in 56 ms. The abilityto adjust resources at such a small cost demonstrates the flexibilityand efficiency of LEAP in terms of flexible resources adjustment. We used the WiFi module to evaluate the LEAP’s performance ofaccessing peripherals. Although the WiFi module can be configuredas secure through TZPC so as to be used in TrustZone directly,however, OP-TEE typically lacks the device drivers, so we cannotmeasure the performance of these devices in TrustZone. SinceSANCTUARY relies on TrustZone to securely access peripheral,we can neither measure its performance. Hence, we compare LEAPwith native ROS to show our peripheral accessing efficiency. onference’17, July 2017, Washington, DC, USA Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong (a) Darknet (CPU Version). (b) NCNN (CPU Version). (c) MNN (CPU Version). (d) MNN CPU vs. GPU Figure 12: Inferring time with different settings.

We run iPerf [5] to benchmark the network throughput for LEAP,ROS when they utilize the WiFi module respectively. Although OP-TEE does not have a WiFi driver, it is possible to send its networkdata through ROS. iPerfTZ [18] is an open source tool that measuresthe OP-TEE network throughput through forwarding network datato a client process running in NW, so we run iPerfTZ to benchmarkthe network throughput of OP-TEE. As SANCTUARY relies on TOSto securely access peripheral, this also represents the performanceof SANCTUARY. Note that it is not a secure way but we haveto measure OP-TEE in this way because it lacks WiFi driver. Thebenchmarks were run in the same settings. We set the socket buffersize to 128KB and tested the network throughput with different TCPwindows sizes. Results are presented in Figure 11. The performanceof accessing the network in LEAP is comparable to that of accessingthe network in ROS. However, for OP-TEE the network throughputof this naive solution is only about 12.5% that of LEAP. The poornetwork throughput is due to the frequency context switch betweenROS and TrustZone to transfer the network data.

We perform case studies that how a representative application, adeep-learning inference using the mobile GPU acceleration, adoptsthe LEAP for secure model execution. According to the study [40,42], currently, it is feasible to steal such valuable in-device modelsfrom intelligent APPs. By applying LEAP, the model of the demoapplication can easily avoid being stolen and defense against othersecurity attacks. We have selected three examples. The first oneis an MNN-based [29] intelligent App that is deployed on LEAPplatform through our LEAP adapter automatically; the other twointelligent Apps are developed from scratch with NCNN [7] andDarkNet [2] framework. Below we will first study the results ofautomatic adaptation, and then evaluate the system performanceon these three examples.

Deploy with LEAP Adaptor.

Please recall that our LEAP Adaptorworks on the existing Apps, and all operations are done on the in-termediate code. This demo application (21 𝑤 lines of intermediatecode) is a deep-learning inference of image classification with theMali mobile GPU acceleration, representing a popular emergingapplication category. The sensitive part to protect contains thedeep-learning model and its inference framework MNN. We adaptthis intelligent App to our LEAP through the LEAP Adaptor, de-scribed in Section 3.3. Our LEAP Adaptor takes 11s to completethe adaptation, occupies 1G of memory, and uses two CPU cores. The Adaptor adds only 80 lines of code to the original App. Thegenerated sc-pAPP has a total of 856 lines of code. Develop from scratch.

We also adapt two example Apps manuallyto show how to develop a LEAP-enabled App from scratch. Thesplit is completed in the following steps. First, we add an integratedLEAP

ROS

API lib into the App project. Second, we add the functionof booting the LEAP sandbox in JNI code, and the code will becalled when the App starts. Third, we modify part of the JNI codethat switches the local DL framework, i.e., NCNN and Darknet, callto the "remote" DL framework call of the sandbox. Therefore, whenthere is an inferring request, it will be forwarded to LEAP sandbox,the inferring procedure will be performed in LEAP sandbox, and theinferring result will be sent back. The application with the modifiedJNI code is called pAPP. Finally, we package the sensitive codes assc-pAPP into the ramfs of a pre-distributed LEAP sandbox image.In order to access the Mali GPU securely, we just need to directlyput the Mali Linux kernel driver into the sandbox image.We evaluate the LEAP’s performance with these end-to-enddemo Apps. We develop several applications with different mod-els and DL frameworks and run the applications with both CPUand GPU of the prototype. In addition to conducting the measure-ment on LEAP, We train four models, i.e., SqueezeNet [27], Mo-bileNetV2 [37], DenseNet201 [26], and ResNet50 [23], for eachframework.Figure 12 shows the performance of running the demo appli-cations in both LEAP. The CPU version means running the demoapplication with the big or little cores on Hikey960. And morethan one cores represent the situation that it dynamically requestCPU cores from ROS for inference. When the demo applicationsdeployed in LEAP uses the CPU to perform the inference, the infer-ence speed for the big core is 2.9 × to 3.4 × faster than little core forDarkNet, and 2.5 × to 4.4 × for NCNN, and 2.3 × to 3.1 × for MNN, re-spectively. Moreover, LEAP’s flexible resource adjustment enablesthe inference speed on the big core to improve 1.2 × to 1.8 × forDarkNet, 1.6 × to 1.8 × for NCNN, and 1.4 × to 1.6 × for MNN.Without loss of generality, we compare the inference speed ofCPU and GPU based on MNN. As shown in Figure 12d, whenthe demo applications run with little core to perform inferencewith GPU, it is 1.2 × to 3.5 × faster than two big cores. And it is1.5 × to 4.9 × faster than two big cores when the demo applicationsperform inference with a big core and GPU. We also evaluate theperformance of running the demo application in TrustZone directly.When the demo application runs in LEAP, the inference speed is1.4 × faster than that of running in the vanilla TrustZone. pp Developer Centric Trusted Execution Environment Conference’17, July 2017, Washington, DC, USA Malicious Driver.

In this work, we assume that the driver usedby the developer is benign and bug free. Although a malicious orbuggy driver can not affect other sandboxes, it may compromise thesandbox it resides in. To prevent this, we can refer to some driverisolation works to prevent malicious drivers from compromisingthe sandbox.

Parallel Peripheral Access.

LEAP supports parallel execution ofmultiple sc-pAPP, but currently its exclusive IO design does notallow multiple sandboxes to access the same device at the sametime. We think this is acceptable since many sensors in android,such as cameras, can only be used by one application at the sametime. Further research can also be conducted to seek a solution thatenables devices to be securely shared among multiple distrustedparties.

Maximum Sandbox Number.

At present, our design is based onan exclusive CPU and memory design. Therefore, the maximumnumber of sandboxes is limited by the number of CPU cores on thedevice. We plan to increase the sandbox density in future work tosupport more parallel environments.

In this paper, we challenge the TrustZone’s evolution strategies andargue the future TrustZone should be a developer-centric TEE tofulfill the App developers’ growing security demand. We compre-hensively analyze the design requirements of App developers andpresent LEAP, a lightweight developer-centric TEE for mobile Apps,to respond to developers’ needs. We implement the LEAP prototypeon Hikey960 and conduct comprehensive analyses to show thatLEAP can balance security and usability in mobile scenarios. Webelieve LEAP can provide new ideas for future TrustZone.

REFERENCES

PrimeCell Infrastructure AMBA 3 TrustZone Protection Controller (BP147) , 2004.[12]

ARM CoreLink TZC-400 TrustZone Address Space Controller , 2013.[13]

ARM Cortex-A Series – Programmer’s Guide for ARMv8-A , 2015.[14]

ARM System Memory Management Unit Architecture Specification , 2016.[15] Brasser, F., Gens, D., Jauernig, P., Sadeghi, A.-R., and Stapf, E. Sanctuary:Arming trustzone with user-space enclaves. In

NDSS (2019).[16] Cerdeira, D., Santos, N., Fonseca, P., and Pinto, S. Sok: Understanding theprevailing security vulnerabilities in trustzone-assisted tee systems. In (2020).[17] Cho, Y., Shin, J., Kwon, D., Ham, M., Kim, Y., and Paek, Y. Hardware-assistedon-demand hypervisor activation for efficient security critical code execution on mobile devices. In (2016).[18] Göttel Christian, Felber Pascal, S. V. iperftz: Understanding network bot-tlenecks for trustzone-based trusted applications. In

Stabilization, Safety, andSecurity of Distributed Systems (2019).[19] Gruss, D., Maurice, C., Wagner, K., and Mangard, S. Flush+flush: A fast andstealthy cache attack. In

International Conference on Detection of Intrusions andMalware, and Vulnerability Assessment (2016).[20] Gruss, D., Spreitzer, R., and Mangard, S. Cache template attacks: Automatingattacks on inclusive last-level caches. In (2015).[21] Guan, L., Liu, P., Xing, X., Ge, X., Zhang, S., Yu, M., and Jaeger, T. Trustshadow:Secure execution of unmodified applications with arm trustzone. In

Proceedingsof the 15th Annual International Conference on Mobile Systems, Applications, andServices (2017).[22] Halderman, J. A., Schoen, S. D., Heninger, N., Clarkson, W., Paul, W., Calan-drino, J. A., Feldman, A. J., Appelbaum, J., and Felten, E. W. Lest we remember:cold-boot attacks on encryption keys.

Communications of the ACM (2009), 91–98.[23] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recogni-tion. In arXiv:1512.03385 (2015).[24] Hua, Z., Gu, J., Xia, Y., Chen, H., Zang, B., and Guan, H. vtz: Virtualizing armtrustzone. In (2017).[25] Huang, A. Keeping secrets in hardware: The microsoft xbox tm case study. In

International Workshop on Cryptographic Hardware and Embedded Systems (2002).[26] Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. Densely con-nected convolutional networks. In

Proceedings of the IEEE conference on computervision and pattern recognition (2017).[27] Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., andKeutzer, K. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < arXiv:1602.07360 (2016).[28] Jang, J., Choi, C., Lee, J., Kwak, N., Lee, S., Choi, Y., and Kang, B. B. Privatezone:Providing a private execution environment using arm trustzone. IEEE Transactionson Dependable and Secure Computing (2016), 797–810.[29] Jiang, X., Wang, H., Chen, Y., Wu, Z., Wang, L., Zou, B., Yang, Y., Cui, Z., Cai,Y., Yu, T., Lv, C., and Wu, Z. Mnn: A universal and efficient inference engine. In

MLSys (2020).[30] Keller, E., Szefer, J., Rexford, J., and Lee, R. B. Nohype: virtualized cloud infras-tructure without the virtualization. In

Proceedings of the 37th annual internationalsymposium on Computer architecture (2010).[31] Kuhn, M. G. Cipher instruction search attack on the bus-encryption securitymicrocontroller ds5002fp.

IEEE Transactions on Computers (1998), 1153–1157.[32] Kwon, D., Seo, J., Cho, Y., Lee, B., and Paek, Y. Pros: Light-weight privatizedsecure oses in arm trustzone.

IEEE Transactions on Mobile Computing (2019),1434–1447.[33] Li, W., Xia, Y., Lu, L., Chen, H., and Zang, B. Teev: virtualizing trusted exe-cution environments on mobile platforms. In

Proceedings of the 15th ACM SIG-PLAN/SIGOPS International Conference on Virtual Execution Environments (2019).[34] Osvik, D. A., Shamir, A., and Tromer, E. Cache attacks and countermeasures:the case of aes. In

Cryptographers’ track at the RSA conference (2006).[35] Ren, J., Qi, Y., Dai, Y., Wang, X., and Shi, Y. Appsec: A safe execution en-vironment for security sensitive applications. In

Proceedings of the 11th ACMSIGPLAN/SIGOPS International Conference on Virtual Execution Environments (2015).[36] Rubinov, K., Rosculete, L., Mitra, T., and Roychoudhury, A. Automatedpartitioning of android applications for trusted execution environments. In (2016).[37] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. Mo-bilenetv2: Inverted residuals and linear bottlenecks. In

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (2018).[38] Song, W., Yin, H., Liu, C., and Song, D. Deepmem: Learning graph neuralnetwork models for fast and robust memory forensic analysis. In

Proceedingsof the 2018 ACM SIGSAC Conference on Computer and Communications Security (2018).[39] Sun, H., Sun, K., Wang, Y., Jing, J., and Wang, H. Trustice: Hardware-assistedisolated computing environments on mobile devices. In (2015).[40] Sun, Z., Sun, R., and Lu, L. Mind your weight (s): A large-scale study on insuf-ficient machine learning model protection in mobile apps. In arXiv:2002.07687 (2020).[41] Vallée-Rai, R., Gagnon, E., Hendren, L., Lam, P., Pominville, P., and Sundare-san, V. Optimizing java bytecode using the soot framework: Is it feasible? In

International conference on compiler construction (2000).[42] Xu, M., Liu, J., Liu, Y., Lin, F. X., Liu, Y., and Liu, X. A first look at deep learningapps on smartphones. In

The World Wide Web Conference (2019).[43] Yarom, Y., and Falkner, K. Flush+reload: a high resolution, low noise, l3 cacheside-channel attack. In (2014).13 onference’17, July 2017, Washington, DC, USA Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong [44] Zhao, S., Zhang, Q., Qin, Y., Feng, W., and Feng, D. Sectee: A software-basedapproach to secure enclave architecture using tee. In