SoftTRR: Protect Page Tables Against RowHammer Attacks using Software-only Target Row Refresh
Zhi Zhang, Yueqiang Cheng, Minghua Wang, Wei He, Wenhao Wang, Nepal Surya, Yansong Gao, Kang Li, Zhe Wang, Chenggang Wu
SSoftTRR: Protect Page Tables Against RowHammer Attacksusing Software-only Target Row Refresh
Zhi Zhang ∗† , Yueqiang Cheng ∗‡ , Minghua Wang § , Wei He ¶ , Wenhao Wang (cid:107) ,Nepal Surya † , Yansong Gao †∗∗ , Kang Li § , Zhe Wang †† , Chenggang Wu ††∗ Both authors contributed equally to this work † Data61, CSIRO, Australia Email: { zhi.zhang,surya.nepal } @data61.csiro.au ‡ NIO Security Research Email: [email protected] § Baidu Security Email: { wangminghua01,kangli01 } @baidu.com ¶ SKLOIS, Institute of Information Engineering, CAS andSchool of Cyber Security, University of Chinese Academy of Sciences Email: [email protected] (cid:107)
Institute of Information Engineering, CAS Email: [email protected] ∗∗ NanJing University of Science and Technology, China Email: [email protected] †† Institute of Computing Technology, Chinese Academy of Sciences Email: { wangzhe12,wucg } @ict.ac.cn Abstract —Rowhammer attacks that corrupt level-1 page ta-bles to gain kernel privilege are the most detrimental to systemsecurity and hard to mitigate. However, recently proposedsoftware-only mitigations are not effective against such kernelprivilege escalation attacks.In this paper, we propose an effective and practical software-only defense, called SoftTRR, to protect page tables fromall existing rowhammer attacks on x86. The key idea ofSoftTRR is to refresh the rows occupied by page tableswhen a suspicious rowhammer activity is detected. SoftTRR ismotivated by DRAM-chip-based target row refresh (ChipTRR)but eliminates its main security limitation (i.e., ChipTRR tracksa limited number of rows and thus can be bypassed bymany-sided hammer [11]). Specifically, SoftTRR protects anunlimited number of page tables by tracking memory accessesto the rows that are in close proximity to page-table rows andrefreshing the page-table rows once the tracked access countexceeds a pre-defined threshold. We implement a prototypeof SoftTRR as a loadable kernel module, and evaluate itssecurity effectiveness, performance overhead, and memoryconsumption. The experimental results show that SoftTRRprotects page tables from real-world rowhammer attacks andincurs small performance overhead as well as memory cost.
Keywords -Rowhammer, Target Row Refresh, Page Table,Software-only Defense
I. I
NTRODUCTION
Rowhammer is a software-induced dynamic random-access memory (DRAM) vulnerability that frequent access-ing (i.e., hammering) DRAM aggressor rows can induce bitflips in neighboring victim rows. An attacker can hammeraggressor rows to corrupt different types of sensitive objectson victim rows without access to them, breaking memorymanagement unit (MMU)-based memory protection, achiev-ing privilege escalation (e.g., [33], [7], [47] or leakingsensitive information (e.g., [5], [26]). Of the many sensitiveobjects that have been corrupted, page table corruptionis the most detrimental to system security, making kernel privilege escalation attacks the mainstream [43]. To date,kernel privilege escalation attacks (e.g., [33], [14], [40],[45], [7], [47]) focus on corrupting level-1 page table entry(L1PTE) and some of them have been demonstrated to gainkernel privilege from unprivileged applications [33], [7],[47], or even from JavaScript in webpages [14].Multiple software-only mitigation schemes [6], [43], [23]can be used to mitigate the kernel privilege escalationattacks. Compared to hardware defenses (e.g., [29], [19],[36], [27]), software-only schemes have the appeal of com-patibility with existing hardware, allowing better deploya-bility. However, existing software-only mitigations are noteffective against all the kernel privilege escalation attacks.Specifically, both CATT [6] and CTA [43] are vulnerableto a state-of-the-art privilege escalation attack (i.e., PTham-mer [47]) that targets L1PTE. ZebRAM [23] works inhardware-assisted virtualized settings, requires both kerneland hypervisor modifications. On top of that, ZebRAMassumes that bit flips occur in a victim row that is only one-row from hammered aggressor row(s), making itself unableto defend against rowhammer (kernel privilege escalation)attacks that target a victim row that is two-row from thehammered rows [21], [47]. To this end, we ask:
Is there an effective and practical software-only defensethat protects page tables against all rowhammer attacks?
Our Contributions.
In this paper, we provide a positive an-swer to the above question. We propose a new software-onlydefense that defends against all kernel privilege escalationattacks on x86, called SoftTRR. SoftTRR is motivated bya hardware defense, that is, ChipTRR [29], [19]. ChipTRRis designed to identify possible victim rows by counting therows’ activations and refresh rows to suppress bit flips whentheir counters reach a pre-defined threshold. ChipTRR was a r X i v : . [ c s . CR ] F e b elieved to close the rowhammer attack venue in present-dayDDR4-based systems, until it was completely circumventedby [11].We observe that the root cause of failure of ChipTRRis that ChipTRR tracks a very limited number of rows andthus bit flips are still possible when multiple rows are beinghammered and the number of hammered rows is larger thanthe tracked rows (also known as many-sided hammer [11]).SoftTRR addresses this limitation by monitoring and track-ing all rows neighboring (victim) rows containing pagetables. SoftTRR leverages MMU-enforced virtual memorysubsystem to frequently track memory accesses to any rowsadjacent to page-table rows, and refreshes page-table rowswhen necessary, making SoftTRR effective in preventingrowhammer from breaking page table integrity.Specifically, MMU is an essential component of modernprocessors that supports OS kernel to enforce memory isola-tion. With the assistance from MMU, the kernel, configurespage tables, mediates every memory access from user space,and captures any unauthorized access that triggers a hard-ware exception. On top of that, the kernel can capture thememory access where relevant page tables have an unused rsrv bit set. With this observation, SoftTRR uses the kernelas the root of trust and frequently configures page tables withthe rsrv bit set to track memory accesses to rows thatneighbor rows of page tables. When the tracked memory-access counters reach a pre-defined threshold, correspondingpage-table rows will be refreshed. By SoftTRR’s design, anadjacent or neighboring row can be multiple-row from apage-table row, thus voiding the above assumption of one-row-distance between victim and aggressor rows made byZebRAM [23].Our prototype implementation of SoftTRR is a loadablekernel module (LKM) without any modification to thekernel. The LKM has 2400 source lines of code and it hasbeen deployed onto three Linux systems where underlyinghardware have either DDR3 or DDR4 chips. We evaluateda SoftTRR-deployed system in terms of security effective-ness, performance, memory consumption and robustness.The experimental results show that SoftTRR is effective inmitigating rowhammer privilege escalation attacks. Besides,SoftTRR incurs low overhead and its memory consumptionis within hundreds of KiB in a real-world use case of LAMP(i.e., Linux, Apache, Mysql and PHP). We also validate therobustness of a SoftTRR-enabled system using system-callstress tests, results of which show that the system runs stablyas a vanilla system.In summary, the main contributions are as follows: • We introduce SoftTRR, an effective and practical software-only mitigation scheme to protect page tables againstrowhammer attacks. To the best of our knowledge, SoftTRRis the first bare-metal solution to defend against existingkernel privilege attacks. • We present a lightweight SoftTRR system to collect pagetables, track memory access, and refresh target page tablesby leveraging MMU and OS kernel features. • We evaluate SoftTRR’s effectiveness against 3 represen-tative rowhammer attacks, its performance overhead andits memory consumption. The experimental results showthat SoftTRR successfully protects page tables against theattacks, and incurs negligible performance overhead andmemory cost.The rest of the paper is structured as follows. In Section II,we introduce address translation, DRAM, rowhammer vul-nerability and rowhammer defenses. In Sections III and IV,we present the design and implementation of SoftTRR. Sec-tions V and VI evaluate SoftTRR’s security effectiveness andperformance impacts, respectively. We discuss and concludethis paper in Section VII and Section VIII.II. B
ACKGROUND AND R ELATED W ORK
In this section, we first describe the address transla-tion, DRAM and its address mapping. We then presentthe rowhammer vulnerability as well as its hardware andsoftware defenses.
A. Address Translation
MMU enforces virtual memory primarily by the means ofpaging mechanism. Paging on the x86-64 platform usuallyuses four levels of page tables to translate a virtual address toa physical address. As such, virtual-address bits are usuallydivided into 4 parts as follows.The bits 39–47 are used to index a page map leveltable entry (PML4 or level-4 page table). The physical baseaddress of the PML4 is stored in the control register of CR3.The bits 30–38 are used to index a page directory pointertable entry (PDPT or level-3 page table). The physical baseaddress of the PDPT comes from the PML4 entry. The bits21–29 are used to index a page directory table entry (PDor level-2 page table) and the physical base address of thePD comes from the PDPT entry. The bits 12–20 are usedto index a page table entry (PT or level-1 page table) andthe physical base address of PT comes from the PD entry.Now the indexed PT entry points to a physical page and therest bits, i.e., 0–11, are page offset. If the physical addressis within a huge page , either two or three levels of pagetables are needed to translate a virtual address within ahuge page of 1 GiB or 1 2MiB. In order to facilitate theaddress translation, Translation Look-aside Buffer (TLB) isintroduced to cache the address translations while cache isinvolved to store the accessed data as well as the page tableentries of all levels.
B. DRAM
The main memory of most modern computers usesDRAM. Memory modules are usually produced in the formof dual inline memory module (DIMM), where both sidesf the memory module have separate electrical contacts formemory chips. Each memory module is directly connectedto the CPU’s memory controller through one of the twochannels. Logically, each memory module consists of tworanks, corresponding to its two sides, and each rank consistsof multiple banks. A bank is structured as arrays of memorycells with rows and columns.Every cell of a bank stores one bit of data whose valuedepends on whether the cell is electrically charged or not.A row is a basic unit for memory access. Each access toa bank “opens” a row by transferring the data from all ofthe cells of the row to the bank’s row buffer that acts asa cache for the most recently accessed row. This operationdischarges all the cells of the row. To prevent data loss, therow buffer is then copied back into the cells, thus rechargingthe cells. Consecutive access to the same row is fulfilledfrom the row buffer, while accessing another row flushes therow buffer. As the charge stored in the DRAM cell dispersesover time, every cell’s charge must be restored or refreshedonce in a specified time period. The typical refresh periodis 64 milliseconds (ms).
DRAM Address Mapping.
The memory controller decideshow physical-address bits are mapped to a DRAM address.A DRAM address refers to a 3-tuple of bank, row, column (DIMM, channel, and rank are included into the bank tuplefield). As this mapping is not publicly documented onthe Intel processor platform, Seaborn et al. [32] observedthat only different rows within the same bank can inducerowhammer bit flips. Based on this observation, they madean educated guess on the DRAM address mapping of anIntel Sandy Bridge CPU. Peter et al. [31], Xiao et al. [45]and Wang et al. [42] exploited a timing side channel [30] touncover the mapping, that is, accessing two virtual addressesthat reside in different rows of the same bank leads to higheraccess latency when compared to accessing the addressesthat are in different banks or in the same row of the samebank.
C. Rowhammer Vulnerability
Kim et al. [22] are the first to perform a large scale studyof rowhammer on DDR3 modules, results of which haveshown that the vulnerability can be triggered by softwareaccesses, that is, frequently accessing rows of i +1 and i − (i.e., aggressor rows) cause bit flips (i.e., charge leakage) inrow i (i.e., victim row).There are four hammer patterns in total. First, double-sided hammer refers to a case where two immediately adja-cent rows of the victim row are hammered simultaneously,which is the most effective hammer pattern in inducing bitflips on DDR3 chips [33]. Second, single-sided hammer randomly picks two aggressor rows in the same bank andhammers them [33]. Third, one-location hammer selectsa single aggressor row for hammer. This hammer patternonly applies to certain systems where the DRAM controller employs an advanced policy to optimize performance [13].Last, many-sided hammer chooses more than two aggressorrows within the same bank for hammer. The aggressor rowsare usually separated by one row and two out of them areexactly adjacent to a victim row [11]. D. Rowhammer Defenses
Hardware Solutions.
Existing hardware solutions em-ployed by the industry can be summarized into three cate-gories. The first is to decrease the DRAM refresh period [22]to refresh all DRAM rows more frequently. For instance,three computer manufacturers (i.e., HP [16], Lenovo [28]and Apple [1]) deployed firmware updates to decrease therefresh period from 64 ms to 32 ms. However,
CLFLUSH-free rowhammer attacks [2] can still induce bit flips inthe refresh period of 32 ms while decreasing the refreshperiod by 8x imposes unacceptable overhead to systems [22].The second one is proposed by Intel [18] that leveragesError Correcting Code (ECC) memory to correct single-bit errors and detect double-bit errors. However, ECC hasbeen reverse engineered and is vulnerable to rowhammer [9].The last is to refresh a row where its adjacent rows areactivated frequently, the so-called activation counter-basedapproaches e.g., [34], [22], [36], [35], [29], [19], [27].ChipTRR [19], [29] is such an approach and has beenadopted by recent DDR4 chip manufacturers. ChipTRR isstill reverse-engineered and defeated by TRRespass [11].
Software Defenses.
Software defenses include both mit-igation and detection techniques. As sensitive data is re-quired to be within victim rows for exploitation, existingmitigation techniques modify memory allocator and thusenforce DRAM-aware memory isolation at different gran-ularity [41], [39], [3], [6], [43], [23]. CATT [6] implementsDRAM isolation between user and kernel memory. CTA [43]provides a dedicated DRAM region for level-1 page tables.ZebRAM [23] physically isolates rows of sensitive data ina zebra pattern. These three defenses can be used to pre-vent page tables from being hammered. ALIS [39] isolatesDMA memory to prevent remote rowhammer attacks (e.g.,Throwhammer [39]) targeting a key-value user application.RIP-RH [3] provides DRAM isolation for user processes tosafeguard user processes.Anvil [2] utilizes CPU performance counters to monitorcache miss rate and detects a rowhammer attack, as typicalrowhammer attacks incur frequent cache misses. However,Anvil is prone to false positives and/or false negatives [43],[6]. Besides, its current implementation cannot detect thePThammer attack [47]. The other detection technique isRADAR [46]. As rowhammer attacks exhibit recognizablerowhammer-correlated sideband patterns in the spectrum ofthe DRAM clock signal, RADAR leverages peripheral cus-tomized devices to capture and analyze the electromagneticsignals emitted by a DRAM-based system. lip
Page Table
Non Page-Table
Page Table Collector KernelPage Table collect maintain
Page-Table & Adjacent PagePage & DRAM Information Page-Table RowsCharge-Leak Counters
Adjacent PageTracer trace maintain Row Refresher refresh
DRAM Bank
SoftTRR Module trigger
Figure 1. SoftTRR Overview. SoftTRR is a kernel module and has three main components.
Page Table collector maintains information about page-tablepages and their adjacent pages in close proximity.
Adjacent Page Tracer traces access to the maintained adjacent pages and updates charge-leak countersfor relevant rows of page-table pages. When the counters reach a pre-defined limit,
Row Refresher is triggered to refresh desired rows hosting page-tablepages. In comparison, rows that have no page tables (highlighted in green) are vulnerable to bit flips.
III. S
OFT
TRR: S
OFTWARE - ONLY T ARGET R OW R EFRESH
We discuss the threat model and assumptions in Sec-tion III-A, the design principles in Section III-B and thedesign overview in Section III-C. Section IV describes theimplementation details.
A. Threat Model and Assumptions
Our primary goal is to protect page tables and guaranteethat an adversary cannot corrupt them to gain kernel priv-ilege escalation through rowhammer on x86 architectures.In our implementation of SoftTRR, we focus on protectinglevel-1 page tables (L1PTs), the same goal as in CTA [43],because page-table-oriented rowhammer attacks all aim atcorrupting L1PTs. Even when other levels of PTs are cor-rupted, they are still not exploitable [43]. In spite of that,SoftTRR can be extended to protect other levels of pagetables and we discuss it in Section VII.We assume kernel as our root of trust, and the kernel mod-ule that enforces our SoftTRR protection is well protected.We consider threats coming from both local adversariesand remote adversaries. A local adversary resides in a lowprivilege user process and thus can execute arbitrary codewithin her privilege boundary. A remote adversary staysoutside by launching an attack, e.g., through a website withJavaScript.The DRAM address mapping is assumed to be avail-able, which can be easily collected using prior works [45],[31], [42]. Besides, previous software-only rowhammer de-fenses [6], [43], [3], [23] consider that hammering row i only affects row i +1 and row i − , which however is not consistent with a recent work by Kim et al. [21]. Specifically,Kim et al. performed a comprehensive study of 300 DRAMmodules from three major DRAM manufacturers and foundthat bit flips can occur in row i +2 or row i − that are two-row from the hammered row i in both DDR3 and DDR4chips. SoftTRR by design protects rows of page tables frombeing flipped by rows that are N -row away and its currentimplementation considers the above work of Kim et al, thatis, the distance between an adjacent row and an L1PT rowis either one row or two row. B. Design Principles
SoftTRR follows the security and practicality design prin-ciples described below. The security principle is to guaranteeSoftTRR can defend against all existing rowhammer attackstargeting page tables. The practicality principles aim to makeSoftTRR applicable to real-world systems. • DP1:
SoftTRR should be effective in protecting ALL pagetables. Without this completeness guarantee, an attacker cangain kernel privilege by compromising the integrity of pagetables that are not protected by SoftTRR. • DP2:
SoftTRR should be compatible with OS kernels. Itneither modifies/adds kernel source code nor breaks kernelcode integrity through binary instrumentation, which hindersits adoption in practice. • DP3:
SoftTRR should has low performance overhead toa protected system.
C. Design Overview
SoftTRR, residing in the kernel space, actively collectsall page tables, and monitors their entire life cycle fromage-table creation to page-table release. For each collectedpage-table page, SoftTRR identifies all its adjacent pagesin DRAM and traces memory accesses to the adjacentpages. Thus, SoftTRR is aware of which adjacent pageis accessed. When the traced access count reaches a pre-defined threshold, SoftTRR knows which page-table page isat the risk of being flipped and promptly refreshes the page(satisfying
DP1 ).All exiting software-only mitigation techniques (see Sec-tion II) deeply hack into the memory allocator to becomeDRAM-aware and add extra allocation/deallocation con-straints. Unlike them, SoftTRR only acquires offline do-main knowledge (i.e., DRAM address mappings of physicaladdresses), without requiring a new memory allocator orchanging legacy allocator logic (satisfying
DP2 ).SoftTRR configures page tables to trace memory accessesto those adjacent pages. Thus, the access to an adjacentpage raises a hardware exception, which is captured bySoftTRR for the tracing purpose. If no such access occurs,no overhead is introduced. Thus, the access to a non-adjacent page is at full speed, which isolates the performanceoverhead caused by accesses to adjacent pages (satisfying
DP3 ).As shown in Figure 1, SoftTRR has three critical compo-nents.
Page Table Collector actively collects all page tablesand maintains their page and DRAM information. On top ofthat, it also collects and maintains adjacent pages . Besidesbeing accessible to unprivileged users, a page is consideredto an adjacent page either itself or its corresponding page-table page is adjacent to (i.e., N -row from) a page-table page.This is based on an observation from Zhang et al. [47].In particular, rowhammer attacks corrupting page tablesare classified into two categories. For explicit attacks, theyrequire attacker-accessible memory adjacent to L1PT pages.In implicit attacks, they only need mutual adjacency amongL1PT pages. Adjacent Page Tracer keeps a close watch over memoryaccesses to collected adjacent pages and maintains a charge-leak counter for a row where a page-table page resides. Ifany one of adjacent pages has been accessed, the charge-leakcounters of nearby page-table rows are updated accordingly,indicating that the page-table rows leak charge with a higherprobability.
Row Refresher remains dormant if charge-leak countersdo not reach a pre-defined limit. If yes, a rowhammer attemptis believed to be taking place and thus the above tracertriggers row refresher. As a response, the row refresherpromptly refreshes desired rows, charge-leak counters ofwhich reach the limit.In the following section, we describe our implementationdetails before analyzing its security effectiveness and eval-uating its performance overhead. IV. I
MPLEMENTATION
As stated in Section III-A, SoftTRR implements L1PTprotection and a row of adjacent pages can be either oneor two row from a row of L1PT pages. Our prototypeimplementation is a loadable kernel module (LKM) withoutmodifications to the kernel. The LKM consists of around2400 source lines of code and works with Ubuntu 16.04installation running a default Linux kernel. Before we talkabout how to implement the three aforementioned compo-nents of SoftTRR, we first introduce important data struc-tures as below.
A. Data Structures
We reuse kernel’s red-black tree structure [10], an efficientself-balancing binary search tree that guarantees searchingin
Θ(log n ) time ( n is the number of tree nodes). As shownin Table I, we have three red-black trees (i.e., pt_rbtree , adj_rbtree , pt_row_rbtree ) and a ring buffer called pte_ringbuf .Specifically, pt_rbtree stores L1PT page informationwhile adj_rbtree stores information of pages that areadjacent to L1PT pages. For the two trees, a physical pagenumber (PPN) of a 4 KiB page is used as the node key andthus a new node will be allocated when information of anew L1PT page or adjacent page needs to be stored. Be-sides, pt_row_rbtree stores DRAM information aboutL1PT pages. For this tree node, row_index works asthe node key and a node can have one or more bankstructures (i.e., bank_struct ). One bank structure storesa bank_index that one or more L1PT pages own (e.g.,multiple L1PT pages share the same row of the same bank).Also note that a page can span across multiple banks [42]and thus an L1PT page can have multiple bank_struct . pt_count records the number of L1PT PPNs that are inthe same row of the same bank. leak_count , short for thecharge-leak counter in Section III-C, stores the number ofaccesses to rows that are adjacent to a row of row_index in the same bank.For a given DRAM module, we leverage a publicly avail-able tool, called DRAMA to reverse-engineer its DRAMaddress mapping, and embed the mapping into the kernelbefore SoftTRR acquires one physical page’s DRAM in-formation. We allocate each node of each tree using theslab allocator [4], which is an efficient memory managementmechanism intended for the kernel’s small object allocationcompared to the buddy allocator. pte_ringbuf stores information of lowest level pagetable entries (PTEs) that are collected by adjacent page tracer(see Section IV-C). These PTEs point to either adjacentpages themselves or huge pages containing adjacent pages.If the adjacent page is a 4 KiB page, the PTE is an L1PTentry. If the adjacent page is part of a huge page (i.e., 2 MiB https://github.com/IAIK/dramaable I. Data structures used by SoftTRR. Main Fields in A Node Descriptions pt_rbtree
PPN A unique page frame number of an L1PT page. adj_rbtree
PPN A unique page frame number of an adjacent page. pt_row_rbtree row index One row index of one or more L1PT pages.bank struct bank index One bank index of one or more L1PT pages.pt count The number of L1PT pages that have the same indexes of bank and row.leak count The number of accesses to rows adjacent to a row of row index and bank index. pte_ringbuf pte One pointer to a page table entry relevant to an adjacent page.vaddr A virtual address relevant to an adjacent page.mm One pointer to mm_struct relevant to a process where an adjacent page resides. or 1 GiB), the PTE is either an L2PT entry or an L3PT entry.Each node of pte_ringbuf is a structure that has threemain fields also shown in Table I. Particularly, pte is apointer to the lowest-level PTE. vaddr is a virtual addressreferring to an adjacent page or its corresponding huge page. mm is a pointer to a kernel structure (i.e., mm_struct )about a process’s address space where the adjacent pagebelongs. The adjacent page tracer combines vaddr and mm to flush the TLB entry that stores the adjacent page’s virtual-to-physical address mapping. B. Page Table Collector
For processes that are already in the the memory beforeour module is loaded, page table collector enquires the listof task_struct to find every existing process. It thenperforms page-table walk for every virtual page in eachvalid virtual memory area (VMA) of each user processto collect information of L1PT pages and their adjacentpages. Specifically, pt_rbtree and pt_row_rbtree store distinct L1PT pages, and their DRAM bank and rowindexes, respectively. To build adj_rbtree , the collectorfirst finds out all user pages that are adjacent to L1PT pagesin DRAM. It then selects all L1PT pages from pt_rbtree that are adjacent to each other and puts all PPNs pointed byselected L1PT pages’ valid entries into adj_rbtree . Forfree pages that are adjacent to L1PT pages and allocatedfor use later (e.g., a free page is allocated and mapped tothe user space right after the collector finishes collectingall adjacent pages), the adjacent page tracer handles themappropriately.For L1PT pages that are dynamically allocated or freedafter the above collection, we perform dynamic inline hooksto multiple kernel functions. Inline hook is called trampolineor detours hook, which is a method of receiving controlwhen a hooked function is called. The control flow isredirected by overwriting the first few (e.g., five) bytesof a target function. Dynamic kernel hook only requiresloading a kernel module without kernel recompilation orbinary rewriting, making itself easy to deploy in practice(e.g., Kprobes, Kpatch) [25], [24], [12].
Reserved0 means that the fault was caused by a non-present page.1 means that the fault was caused by a page-level protection violation. Reserved R S V D I / D P K U / S W / R P S G X P page-table entry R S V D Figure 2. Page-Fault Error Code.
We leverage a library to hook two kernel functions, i.e., __pte_alloc and __free_pages . __pte_alloc traces newly allocated L1PT pages, and __free_pages monitors dynamically released L1PT pages as well as ad-jacent pages. The collector hooks these two functions toupdate the three red-black trees as follows: • For a newly allocated L1PT page, its page, bankand row indexes will be updated into pt_rbtree and pt_row_rbtree , respectively. If there are user pagesthat are adjacent to the L1PT page, they are added into adj_rbtree . • If an adjacent page is freed, it will be removed from adj_rbtree . • If an L1PT page is freed, it will be removed from pt_rbtree . Also, the collector acquires a node in pt_row_rbtree that has the freed page’s row index.Within the node, pt_count in each bank_struct cor-responding to the freed page is decremented by one. If every pt_count for the node becomes 0, then the node is deletedfrom pt_row_rbtree . Besides, the freed page’s adjacentpages in adj_rbtree are removed.
C. Adjacent Page Tracer
To trace memory accesses to allocated and potentialadjacent pages at runtime, the adjacent page tracer leverages https://github.com/cppcoffee/inl hook age fault handler. Page Fault Handler.
A page fault is a type of hardwareexception. Whenever a user access to a virtual page violatesaccess permissions dictated by some PTEs, a page faultarises and will be captured by memory management unit(MMU). In response, MMU will switch the process contextto the kernel, which invokes the page fault handler to handlethe fault based on an error code. The error code is generatedby hardware and there are 7 page-fault error codes as shownin Figure 2 (i.e., bits 0–5 and bit 15). For instance, whena memory access to a virtual address that is marked asnon-present in the PTE (i.e., present bit is cleared), theaccess triggers a non-present page fault with P bit in theerror code set to 0. To handle this page fault, the page faulthandler allocates a new physical page for the virtual addressand marks the address as present in the PTE, the so-called demand paging . Leverage Page Fault.
The adjacent page tracer can trace thememory access to a page by configuring flag bits in a PTEand hooking the page fault handler (i.e., do_page_fault function in the kernel space). As the memory access can be read , write or instruction fetch , not every flag bit can beleveraged. For instance, a physical page becomes read-onlywhen its corresponding PTE has RW bit cleared. Once write-access to the page occurs, a page fault is generated with W/R bit of the error code set to 1. As such, we experimented witheach flag bit, results of which show that both present bitand rsrv bit in a PTE can be used for the tracing purpose.Next, we discuss why the tracer chooses rsrv bit ratherthan present bit.Particularly, configuring present bit to trace the mem-ory access causes kernel crashes, since the kernel performsactive checks of present bit in a lowest-level PTE inmultiple cases. For instance, when a process is forking anew child process, the kernel checks present bit in theprocess’s lowest-level PTEs. If one of the PTEs points to aphysical page that is traced, present bit in the PTE is setto 0 by the tracer. When the kernel check has such a case,the kernel will abort. As the tracer is unaware of when theforking occurs, it cannot restore present bit to 1 beforethe kernel check.On top of that, we observe that one PTE has multiple rsrv bits in x86-64 which are unused and set to 0 bydefault. An access to a page with one rsrv bit in the PTEset to 1 will trigger a page fault and generate an error codeof
RSVD bit set to 1 as shown in Figure 2. In contrast to the present bit check, the kernel does not perform any checkagainst lowest-level PTEs’ rsrv bits. For instance, if anadjacent page is a part of a huge page of 2 MiB, its lowest-level PTE is an L2PT entry and the kernel does not inspectany rsrv bits in the entry. As the page table managementis a core component of the kernel, its code logic remainsrelatively stable. Take a recent stable Linux kernel version (i.e., 5.10.4) as an example, there is no check against any rsrv bits, either. It is probably because that rsrv bitsremain unused in lowest-level PTEs. In our implementation,the tracer chooses a rsrv bit (i.e., bit 51) in the PTE forconfiguration.
Trace Adjacent Page.
Upon the tracer has configured rsrv bits in relevant PTEs pointing to the adjacent pagesor the huge pages containing the adjacent pages, andflushed desired TLB entries, subsequent access to an ad-jacent page or its huge page will trigger a page fault. As do_page_fault is hooked, the tracer captures a faulting(huge) page with an expected error code of
RSVD andcollects complete DRAM information from the faulting(huge) page. Thus, the tracer updates leak_count ofL1PT pages that are adjacent to either the captured (huge)page or its L1PT page. As an L1PT page has multiple bank_struct , leak_count of each bank_struct for the adjacent L1PT page should be updated accordingly.If the leak_count reaches a pre-determined limit, rowrefresher will be triggered (see Section IV-D).We note that the tracer needs to clear rsrv bit beforetransferring control back to the user space in order toresume the memory access. However, any subsequent accessto the same adjacent page or its huge page is no longertraced as rsrv bit is cleared. To address this issue, thetracer sets up a periodic timer to configure rsrv bit ina fixed interval and thus traces the accesses as frequentlyas possible. Specifically, when a timer comes, the tracerleverages kernel’s reverse mapping feature to translate aPPN in adj_rbtree to a set of virtual addresses, as aPPN can be mapped to multiple virtual addresses. For eachaddress, the tracer performs page-table walk, sets rsrv bit in its lowest-level PTE and flushes its cached TLBentry. Inside the reverse mapping, the tracer walks a linkedlist of allocated VMAs, performs page-table walk for eachvirtual address candidate in a VMA and finds out a setof virtual addresses whose PPN matches a given PPN in adj_rbtree .It is clearly inefficient to do the reverse-mapping andpage-table walk for every PPN in adj_rbtree in everytimer. To improve the efficiency, the tracer sets rsrv bits inPTEs relevant to the pages in adj_rbtree and then freescorresponding nodes in adj_rbtree in the first timer. Ifpage faults with the error code of RSVD occur, the tracercaptures them and stores the faulting addresses’ PTE infor-mation into a dedicated ring buffer (i.e., pte_ringbuf ).When subsequent timers come, the tracer sets rsrv bitsin PTEs stored in pte_ringbuf and handles remainingnodes in adj_rbtree . For any free page that is allocatedfor the user space in the default page fault handler, the tracerchecks if its PPN or its L1PT page’s PPN is adjacent to anyPPN in pt_rbtree . If so, its PTE information is insertedinto pte_ringbuf .articularly, pte_ringbuf maintains two pointers forupdates, i.e., head and tail . If a new PTE is inserted to pte_ringbuf , the head pointer is updated and pointsto the node of the latest inserted PTE. If one PTE isremoved from pte_ringbuf (i.e., its rsrv bit has beenconfigured), the tail pointer is updated and points to theleast recently inserted PTE. When the head and the tail point to the same ring buffer node, the buffer becomesempty. The ring buffer size is pre-determined empirically.When the node number between the tail and the head pointers is no less than 80% of the total node number ofthe ring buffer, the tracer allocates a larger ring buffer (e.g.,four times of the old ring buffer size in our implementation),which will store newly inserted PTE. The old ring bufferwill be freed when its stored PTEs are all configured by thetracer (i.e., tail equals head ).As shown in Figure 3, the interval between each timershould be small enough to keep adjacent pages under closesurveillance and thus update leak_count promptly. Onthe other hand, our system might experience unacceptableoverhead if the timer is too frequent and causes numerouscontext switches between user and kernel. To this end,we discuss how to decide the timer interval (denoted as timer inr ) in Section IV-E to keep SoftTRR’s securityguarantee while minimize its performance impacts.
D. Row Refresher
Direct-physical Map.
Linux systems and paravirtualizedhypervisors (e.g., Xen) map the whole available physicalmemory directly into the kernel space [20], [44] in orderfor the kernel to access any data or code in the physicalmemory. Thus, every physical page allocated for the userspace has been mapped to at least two virtual pages, i.e.,a user virtual page and a kernel virtual page. While for akernel’s physical page, it is mapped to a single kernel virtualpage.
Refresh Desired Rows. If leak_count in bank_struct reaches a pre-determined limit (denotedas count limit ), the row refresher refreshes desired rowsspecified by relevant bank_struct . As each node in pt_row_rbtree provides bank indexes and row indexes,the refresher leverages them to reconstruct a physicaladdress. Based on the direct-physical map, the refresherfinds out a kernel virtual address mapped to the physicaladdress. As a read-access to a row can automatically re-charge the row and prevent potential bit flips, the refresherflushes CPU caches of the kernel virtual address, reads thevirtual address, and resets leak_count to 0 at last.If count limit is set too small (i.e., 1), the refreshing costmay become unacceptable as many unnecessary refreshescan be introduced by regular memory accesses to adjacentpages. If count limit is too large, the refresher is unableto promptly refresh a row before it is flipped. As such, t timer_inr t t t t n memory accessthreshold reaches count_limitt Figure 3. The adjacent page tracer sets up tracing to adjacent pages ineach time point from t , t , t , t , ..., t n and the interval between twoadjacent time points is timer_inr . The tracer captures the first memoryaccess (highlighted in green) and ignores subsequent memory accesses ineach interval of timer_inr and thus updates leak_count . Whenever leak_count reaches count_limit , the row refresher starts. Refresh Period (ms) R e q u i r e d A C T s p e r A gg r e ss o r R o w DDR3-based systemDDR4-based system
Figure 4. Minimal ∞ ) in a refresh period of 8ms means that no bit flipis observed using a large number of 10,000 K ACTs. count limit should be no less than 2 and we decide itsvalue in the following section. E. Offline Profile
SoftTRR decides realistic and reasonable timer inr and count limit to keep its security and practicality principles.As illustrated in Figure 3, leak_count is updated inthe first memory access to an adjacent page within each timer inr . When the first memory access occurs rightafter the the adjacent page tracer sets up tracing upon eachtimer, an adversary can achieve the maximum time interval(denoted as threshold ) for hammer, i.e., threshold = timer inr ∗ ( count limit − . As such, SoftTRR mustensure that no bit flip can be induced within the threshold .We can determine the threshold value based on theminimal number of activations ( t RC ) between a pair ofactivations to the same row is determined by hardware andits value is usually around 50 nanoseconds [22]. able II. No single bit flip is observed in 29 DRAM modules shown in the table when their DRAM refresh period is set to 8 ms. ( ∗ : no hammer patternhas been discovered for the module.) Mother Board CPU DRAM Module Hammer PatternType Vendor Size
ASUS Z97-A i7-4790 DDR3 ADATA 8 GiB 16 AD3X1600W8G11-B 2-sided HammerApacer 8 GiB 16 78.C1GET.DF10CGeil 8 GiB 16 CL11-11-11 D3-1600GoodRam 8 GiB 16 GR1333D364L9/8GG.Skill 4 GiB × × ∗ Crucial 8 GiB 16 CT8G4DFS8213.8FA1 7-sided HammerCrucial 8 GiB 32 16ATF1G64AZ-2G1A2 2-sided HammerHynix 8 GiB 32 HMA41GU6AFR8N-TF 12-sided HammerHynix 8 GiB 16 HMA81GU6DJR8N-VK 7-sided HammerHynix 8 GiB 16 HMA81GU6JJR8N-VK 6-sided HammerKingston 8 GiB 16 9905678-105.A00G 12-sided Hammer(Kabylake) Kingston 8 GiB 32 99P5701-005.A00G 3-sided HammerRamaxel 8 GiB 16 RMUA5110MH78HAF-2666 7-sided HammerSamsung 16 GiB 32 M378A2K43CB1-CRC 24-sided HammerTeam Group 8 GiB 16 TEAMGROUP-UD4-2666 8-sided HammerASUS TUF B360M-PLUS i5-9400 DDR4 ADATA 8 GiB 16 AD4X240038G17-BP 5-sided HammerApacer 8 GiB 16 D12.2324WC.001 2-sided HammerCrucial 8 GiB 16 BLS8G4D30AESCK.M8FE 4-sided HammerCrucial 8 GiB 16 CT8G4DFS8266.C8FD1 2-sided HammerCrucial 16 GiB 32 CT16G4DFD8266.16FH1 18-sided HammerGAMING S (Coffeelake) Kingston 8 GiB 8 99P5713-005.A00G 3-sided HammerKlevv 8 GiB 16 KD48GU881-26N1900 5-sided HammerSamsung 8 GiB 16 M378A1K43CB2-CTD 20-sided Hammer
Minimal
We summarizefrom previous works [11], [22], [21], [8], [46] that a minimalvalue of tREFI in BIOS).The test implementation is based on the key takeaways from Cojocar et al. [8]. Specifically, a hammer instructionsequence of two clflushopt alone is by far the mostefficient in a few Intel servers that are equipped withSkylake or Cascade Lake and have multiple sockets [8]. Forother machines that support clflushopt , the instructionsequence is two pairs of clflushopt with a memoryload. If clflushopt is unavailable, the optimal one istwo pairs of clflush with a memory load. To this end, weuse 2-sided hammer (i.e., double-sided hammer as the mosteffective hammer pattern in DDR3 chips for triggering bitflips [33], [11]) against the DDR3-based system with 2 pairsof clflush and a memory load. For the DDR4-based sys-tem, 2-sided hammer is not effective as recent DDR4 chipsare hardened by the ChipTRR mitigation [11]. We leveragean open-source rowhammer fuzzer, called TRRespass , toautomatically discover the effective hammer pattern for agiven DDR4 module. For our DDR4-based system, we testit using 3-sided hammer with 3 pairs of clflushopt anda memory load.For both systems, there are 32 banks that have about1000 K rows in total and they allow roughly 90% of theirtotal memory for the rowhammer test. By using the hammerpattern and hammer instruction sequence, we have checkedabout 870 K rows in the DDR3-based system and 860 K rowsin the DDR4-based system for bit flips. The reason whythe DDR4-based system has a lower number of rows being https://github.com/vusec/trrespass hecked is that its hammer pattern requires three aggressorrows.For every given DRAM refresh period for each system,we use a pseudo binary search algorithm to find out the min. threshold and maintain the security guarantee ofSoftTRR using the following empirical observation. Set Threshold.
Firstly, we learn from previous works [22],[21] that bit flips in present DRAM-based systems can beeliminated when the DRAM refresh period is sufficientlyshort. Particularly, Kim et al. [22] performed an extensiverowhammer test of 129 modern DRAM modules and theyobserved that the rowhammer vulnerability completely dis-appears when the DRAM refresh period is set to no morethan 8 ms.On top of that, we perform an extensive rowhammer testagainst 29 DRAM modules including DDR3 and DDR4shown in Table II. Our experimental results show that nobit flip occur in the DRAM refresh period of 8 ms, whichconfirms their observation above. Specifically, we conductthe rowhammer test that is a bit different from that of theaforementioned DDR3 and DDR4-based systems, that is, inthe DRAM refresh period of 8 ms, 90% of the total memoryhave been tested using proper hammer pattern and hammerinstruction sequence with a standard number of 1000 KACTs per aggressor row. For each DDR4 module, we needto leverage TRRespass to discover their effective hammerpattern. However, some modules have been fuzzed for 48hours and no bit flip is observed in the standard DRAMrefresh period. In such cases, we increase their DRAMrefresh periods and fuzz them again for 48 hours until ahammer pattern is discovered. For instance, 7-sided hammerpattern is uncovered for the Hynix module with part numberof HMA81GU6DJR8N-VK when the DRAM refresh periodis set to 256 ms. For the Corsair module with part numberof CM4X8GF2400C16K2-CN, it has been fuzzed for 48hours and no single bit flip is induced even in the maximumDRAM refresh period of 448 ms.As a higher value of threshold indicates a lower per-formance overhead, threshold is set to 8 ms. We decide timer inr as well as count limit based on SoftTRR-induced performance costs. Specifically, we enumerate everypossible value of timer inr (e.g., 4 ms and 8 ms) andobtain a corresponding value of count limit (e.g., 3 and2). For each pair of possible values, we measure perfor- mance impacts of SoftTRR using
SPECint timer inr and count limit are set to 8 ms and 2,respectively. V. S
ECURITY E VALUATION
We now turn to evaluate the security effectiveness of Soft-TRR on three different hardware configurations, summarizedin Table III, all running Ubuntu 16.04.We deploy SoftTRR into each system against one rep-resentative kernel privilege escalation attack, i.e., MemorySpray [33] that hammers user memory adjacent to L1PTEs,CATTmew [7] that hammers device driver buffer adjacentto L1PTEs, and PThammer [47] that implicitly hammersL1PTEs adjacent to other L1PTEs. Both Memory Sprayand CATTmew are explicit rowhammer attacks with twodifferent types of memory accessible to unprivileged users.PThammer is the only implicit rowhammer attack.
A. Defeat Memory Spray
Background.
The Memory Spray [33] is the first rowham-mer attack targeting L1PTs. It is a probabilistic attack, asit sprays numerous L1PT pages into the memory with thehope that some L1PT pages are placed onto victim rows thatare adjacent to attacker-controlled rows. As such, exploitablebits in L1PTEs can be flipped, which enables kernel privilegeescalation.
Evaluation Details.
We test the effectiveness of SoftTRRagainst the Memory Spray on the Dell Optiplex 390. In thismachine, traditional hammer patterns (e.g., 2-sided hammer)cannot trigger any bit flips and instead we use the 3-sidedhammer identified by TRRespass. We first conduct 3-sidedhammer to randomly identify m (e.g., 50 in our evaluation)vulnerable pages that have reproducible bit flips, that is, avulnerable page has at least one victim physical address ( P c )and hammering three aggressor addresses P a , P b and P d willflip bits in P c .We then optimize the attack by using the kernel privilegeto put page tables onto vulnerable pages in a deterministicway. Specifically, we spray m pages of L1PTs by creatinga virtual memory region of m MiB, ask the kernel to copythe content of the m pages of L1PTs into the m vulnerablepages, which are then used to translate the virtual memoryregion. The vulnerable pages now contain L1PTs and theoriginal L1PTs are removed. By doing so, an attacker willdefinitely corrupt any of the L1PTs pages by hammeringthree relevant aggressor addresses. When SoftTRR is en-abled with the m pages of L1PTs protected, we re-start theoptimized attack for m hours (one-hour hammer for onevulnerable L1PT page) and observe no single bit flip in those m pages of L1PTs by checking their integrity, indicating thatthe Memory Spray attack has been successfully defeated. able III. Each rowhammer attack targets m (e.g., 50 in our experiments) victim pages of L1PTEs for hammer. With SoftTRR enabled, the three attacksfail to induce bit flips in these pages, indicating that these attacks have been mitigated. Machine Model Hardware Configuration Attack SoftTRR
CPU Arch. CPU Model DRAM (Part No.) m Targeted Victim Pages Bit Flip Failed?Dell Optiplex 390 KabyLake i7-7700k Kingston DDR4 Memory Spray [33] (cid:34) (99P5701-005.A00G)Dell Optiplex 990 SandyBridge i5-2400 Samsung DDR3 CATTmew [7] (cid:34) (M378B5273DH0-CH9)Thinkpad X230 IvyBridge i5-3230M Samsung DDR3 PThammer [47] (cid:34) (M471B5273DH0-CH9)
B. Defeat CATTmew
Background.
As mentioned in Section II, CATT [6] en-forces physical user-kernel isolation. CATTmew [7] breaksCATT’s security guarantee by identifying device (e.g., SCSIGeneric) driver buffers that are kernel memory but can beaccessed by unprivileged users. CATTmew sprays L1PTpages to neighbor the driver buffers for hammer, with thehope that these L1PT pages reside in rows prone to bit flips.
Evaluation Details.
We first search m vulnerable pageson the Dell Optiplex 990 using 2-sided hammer. Thus, avulnerable page has at least one victim physical address ( P b )and hammering two aggressor addresses ( P a and P c ) willflip bits in P b .We then rely on the kernel privilege to covert CATTmewinto a deterministic attack. Specifically, we spray m L1PTpages and copy their entries onto the m vulnerable pages aswhat we did in the optimized Memory Spray attack. On topof that, we apply for the SCSI Generic (SG) buffer usingLinux user APIs. In this test machine, we can apply as largeas MiB and only m KiB of the SG buffer is enough.We instruct the kernel to copy the allocated SG buffer’scontent into the m aggressor pages and change the buffer’saddress mappings accordingly. To this end, hammering thebuffer will induce bit flips in the vulnerable L1PT pages.However, when SoftTRR is set active, no single bit fliphas been observed in those L1PT pages after m hours ofhammering, indicating that SoftTRR is effective in defeatingthe CATTmew attack. C. Defeat PThammer
Background.
Rowhammer attacks before PThammer [47]are explicit rowhammer that require access to an exploitablehammer row (e.g. adjacent to an L1PT row), that is, partof memory in the hammer row should be available tothe attacker. PThammer voids the above requirement. Byspaying L1PT pages and placing some onto victim rowswith a high probability, PThammer exploits page-table walkto produce frequent loads of some L1PTEs from aggressorrows (i.e., “hammering L1PTEs”), which induces bit flips inother L1PTEs of victim rows.
Evaluation Details.
We optimize PThammer using thekernel privilege to present a more efficient and determinis- tic attack on the Thinkpad X230. Specifically, PThammeruses eviction sets to flush TLB entries and CPU cachesof L1PTEs, making the flush probabilistic. In our test,PThammer is allowed to instruct the kernel to do the flushthrough privileged instructions. Compared to the regular 2-sided hammer that flushes CPU caches in the user space,trapping into the kernel for flushing cache is much lessefficient for hammer, making it harder in finding vulnerablepages. To address this issue, we add a certain number of
NOP (e.g., 180) instructions in each round of the 2-sided hammerso as to meet the time cost taken by the 2-sided kernel-assisted hammer. By doing so, we discover m vulnerablepages that are flippable to the kernel-assisted PThammer.As PThammer massages L1PTEs onto vulnerable pageswith a probability, we instead spray m L1PT pages bycreating a virtual memory region of m MiB. We thenask the kernel to copy all entries of the L1PT pages intothe m vulnerable pages and the m aggressor pages. Thekernel then changes the address mappings of the createdvirtual memory. When addresses in the virtual memoryare accessed frequently, their page-table walk will hammertheir corresponding L1PT pages and induce bit flips in thevulnerable L1PT pages. In comparison, we enable SoftTRRbefore starting the PThammer. After m hours of hammering,no bit flip occurs again, showing that SoftTRR has mitigatedPThammer as well.VI. P ERFORMANCE E VALUATION
We evaluate the performance impacts including memoryconsumption induced by SoftTRR. The experiments areconducted in a DDR4-based system. The system is Ubuntu16.04 running on top of a Dell Desktop with Intel i7-7700Kand Hynix 8 GiB DDR4 (part number: HMA81GU6DJR8N-VK). By default, SoftTRR supports an adjacent row that canbe either one-row or two-row from an L1PT-page row, de-noted by ∆ { , } . In comparison, we also measure its impactsin the scenario of only one-row-distance that previous works(e.g., [23]) assume, denoted by ∆ { } . We also validate thesystem robustness of SoftTRR using a Linux test project.The experimental results have shown that SoftTRR in bothscenarios of ∆ { } and ∆ { , } incurs modest overhead anddoes not affect the stability of the protected system, makingitself practical. able IV. CPU computation and memory benchmark results from SPECint . The averaged overhead induced by SoftTRR in ∆ { , } (i.e.,either one or two row adjacency by default) and ∆ { } (i.e.,one-row-adjacency) are below 4%. Programs SoftTRR Overhead ∆ { } ∆ { , } (default)perlbench 2.66% 5.32%bzip2 0.33% 0.33%gcc 4.76% 6.55%mcf -0.17% 2.94%gobmk 0.68% 0.68%hmmer 1.33% 0.89%sjeng 0.31% 0.61%libquantum 1.95% 0.98%h264ref 0.32% 0.00%omnetpp 8.19% 6.90%astar 2.60% 5.20%xalancbmk 12.78% 11.94% Mean
A. Benchmark Runtime Overhead
We measure SoftTRR-induced runtime overhead usingtwo popular benchmarks, i.e.,
SPECint . SPECint is an industry standard benchmark suite in-tended for measuring the performance of the CPU andmemory. For this suite, we launch integer programs witha specific configuration file (i.e., linux64-amd64-gcc43+.cfg )and summarize the benchmark results in Table IV. As wecan see from the table, the overhead of ∆ { , } (i.e., 3.53%)is a little bit higher than that of ∆ { } (i.e., 2.85%). Phoronix is a free and open-source benchmark softwarefor mainstream OSes (e.g., Linux, MacOS and Windows). Itallows for testing performance overhead against commonapplications in an automated manner. As this suite hasa large number of programs testing different aspects ofa system, we select a subset of the available programsto stress-test performance of CPU, memory, network I/Oand disk I/O, similar to previous software-only rowhammerdefenses (e.g., [6], [43]). As shown in Table V, the averageperformance overhead is respectively 0.94% for ∆ { } and0.78% for ∆ { , } , indicating that the Phoronix overheadis negligible in both scenarios.
B. LAMP Runtime Memory Consumption
We use a real-world use case to measure runtime memoryconsumption of SoftTRR, that is, a LAMP server (i.e.,Linux, Apache, MySQL and PHP). We also run a commontool (i.e.,
Nikto [38]) in another machine for 60 minutesto stress test the LAMP server.
Nikto [38] is a web serverscanner that tests the LAMP server for insecure files andoutdated server software. It also carries out generic andserver type specific checks. https://github.com/phoronix-test-suite/phoronix-test-suite Table V. CPU computation, memory operations and disk I/O benchmarkresults from Phoronix . The averaged overhead induced by SoftTRR in ∆ { } and ∆ { , } are within 1%. Programs SoftTRR Overhead ∆ { } ∆ { , } Apache 1.98% 3.71%unpack-linux 2.06% 3.78%iozone 0.00% 0.00%postmark 1.11% 2.24%stream:Copy -1.89% -0.36%stream:Scale 1.02% -0.11%stream:Triad 0.81% -0.27%stream:Add 1.12% -0.25%compress-7zip 3.99% 1.81%openssl 1.70% 1.54%pybench 0.62% 0.31%phpbench 0.48% 0.60%cacheben:read 0.52% 0.43%cacheben:write 0.52% 0.11%cacheben:modify 0.55% 0.14%ramspeed:INT 0.72% -0.19%ramspeed:FP 0.63% -0.14%
Mean
The memory cost induced by SoftTRR within the 60minutes is shown in Figure 5. The memory consump-tion is a total memory size of three red-black trees (i.e., pt_rbtree , pt_row_rbtree and adj_rbtree ) andthe ring buffer (i.e., pte_ringbuf ). We note that the pre-allocated pte_ringbuf is 396 KiB. Protected and Traced Page Number.
When computingthe memory consumption, we also collect the unique pagenumbers that SoftTRR protects and traces, respectively.As shown in Figure 6, both protected L1PT page numberand traced adjacent page number in either ∆ { } or ∆ { , } gradually increase, probably because system activities in-cluding user processes trigger creations of L1PT pages andadjacent pages. Besides, in ∆ { } , the averaged adjacent pagenumber is 3120, roughly three times as the averaged L1PTpage number of 920. In ∆ { , } , the averaged adjacent pagenumber is 4937, roughly four times as the averaged L1PTpage number of 1219. This is probably because adjacent rownumber in ∆ { , } is twice as many as in ∆ { } and a rowcan have multiple page numbers. C. System Robustness
To evaluate the robustness of our test system after deploy-ing SoftTRR, we select 20 system calls of different typesand perform stress tests for each selected system call onboth the vanilla system and the SoftTRR-based system. Thestress tests come from Linux Test Project (LTP) and theyare used to identify system problems. As can be seen fromTable VI, the stress test results clearly show that there isno deviation for the SoftTRR-based system compared to thevanilla system. Also, we do not observe any issue when https://github.com/linux-test-project/ltp
10 20 30 40 50 60
Time (mins) M e m o r y C o n s u m p t i o n ( K i B ) {1,2} (default) {1} Figure 5. The memory consumption in the LAMP production environmentcaused by SoftTRR. The required memory for SoftTRR in both ∆ { } and ∆ { , } increase gradually and reach a relatively stable level in the last 10minutes. The memory cost in each scenario is within 650 KiB. P r o t e c t e d L P T P a g e N u m ( d a s h e d li n e ) T r a c e d A d j a c e n t P a g e N u m ( s o li d li n e ) {1,2}{1,2}{1}{1} Figure 6. Compared to ∆ { } , both protected L1PT page number andtraced adjacent page number in ∆ { , } are higher as SoftTRR in ∆ { , } collects more. Besides, the traced adjacent page numbers are much higherthan the protected L1PT page numbers in both scenarios, since an L1PT-page row can have two adjacent rows in ∆ { } and four adjacent rows in ∆ { , } . executing previous benchmarks. As a result, the test systemruns stable with SoftTRR enabled.VII. D ISCUSSION
Root Privilege Escalation Attack.
Rowhammer root privi-lege escalation attack is that an unprivileged adversary gainsroot privilege by corrupting opcodes of a setuid process [13].This well-known root privilege escalation attack on x86 hasalready been effectively and efficiently defeated by RIP-RH [3] that physically isolates sensitive user processes. Inaddition, SoftTRR can also be extended to defend against
Table VI. Stress test results of 20 system calls from Linux Test Projectfor SoftTRR in both ∆ { } and ∆ { , } . Linux Test Project Vanilla System SoftTRR ∆ { } ∆ { , } (default) File open (cid:34) (cid:34) (cid:34) close (cid:34) (cid:34) (cid:34) ftruncate (cid:34) (cid:34) (cid:34) rename (cid:34) (cid:34) (cid:34)
Network
Listen (cid:34) (cid:34) (cid:34)
Socket (cid:34) (cid:34) (cid:34)
Send (cid:34) (cid:34) (cid:34)
Recv (cid:34) (cid:34) (cid:34)
Memory mmap (cid:34) (cid:34) (cid:34) munmap (cid:34) (cid:34) (cid:34) brk (cid:34) (cid:34) (cid:34) mlock (cid:34) (cid:34) (cid:34) munlock (cid:34) (cid:34) (cid:34) mremap (cid:34) (cid:34) (cid:34)
Process getpid (cid:34) (cid:34) (cid:34) exit (cid:34) (cid:34) (cid:34) clone (cid:34) (cid:34) (cid:34)
Misc. ioctl (cid:34) (cid:34) (cid:34) prctl (cid:34) (cid:34) (cid:34) vhangup (cid:34) (cid:34) (cid:34) this attack. As described in Section III-C, SoftTRR treatspage tables as protected objects. Thus, trusted user can passspecified objects (i.e., binary code of setuid processes)to SoftTRR through a provided user API and SoftTRR usesthe same mechanism to protect those objects.
DMA-based Kernel Privilege Escalation Attack.
Thereis NO existing DMA-based kernel privilege escalation at-tack on x86. The famous kernel privilege escalation attackon ARM is Drammer [40], and it has been defeated byGuardION [41] that enforces DMA memory isolation. In thefuture, if such attacks on x86 prove to be feasible, we cantake the following two ways to solve. One is to integrateSoftTRR with existing orthogonal defenses. In particular,ALIS [39] on x86 physically isolates DMA memory usingguard rows and bit flips are thus confined to DMA memoryof attackers.Alternatively, SoftTRR can leverage IOMMU [17] tomonitor remote access to DMA memory by configuring I/Opage tables, similar to MMU-based page tables. Specifically,SoftTRR collects (I/O) page tables and their adjacent DMAmemory pages that are allocated to users. By configuringI/O page tables, SoftTRR can traces accesses to the collectedDMA pages. When IOMMU is widely available on the x86platform, we believe that SoftTRR can leverage it to defend(I/O) page tables against unknown DMA-based rowhammerattacks.
Level-1 and Higher Level Page Table.
Existing kernelprivilege escalation attacks focus on corrupt level-1 pageables (L1PTs), and there is no demonstrated attack thathas successfully exploited higher-level page tables [43]. Ifsuch an attack may be feasible in the future, we can easilyextend our SoftTRR to protect higher-level page tables.For instance, when SoftTRR is extended to protect L2PTpages , SoftTRR collects desired user pages if they or theircorresponding L1PT or L2PT pages are adjacent to eitherL1PT or L2PT pages. SoftTRR traces the collected userpages by setting rsrv bits in their lowest-level PTEs andrefreshes relevant page-table pages when necessary. Sincethe number of higher-level PT pages is significantly smallerthan the number of L1PT pages (e.g., an L2PT page canpoint up to 512 L1PT pages), we believe that the additionalperformance overhead will not be high.
Cross-platform Support.
SoftTRR, by its design, leveragesMMU-based memory subsystem of Linux kernel for pagetable protection and thus works in mainstream architecturesand DRAM modules. Although its current implementationworks in x86-based systems where either DDR3 or DDR4chips are used, we believe that SoftTRR can work in otherhardware platforms (e.g., ARM and LpDDR3) where theLinux system is supported.VIII. C
ONCLUSION
In this paper, we proposed a software-only defense, namedSoftTRR, that protects level-1 page tables against rowham-mer attacks on x86. SoftTRR is a loadable kernel moduleand compatible with commodity Linux systems withoutrequiring any kernel modification.We evaluated the security effectiveness of a SoftTRR-enabled system using three kernel privilege escalation at-tacks. Also, we measured SoftTRR’s performance overhead,memory cost, and stability using multiple benchmark suitesand a real-world use case. The experimental results indicatethat SoftTRR is effective in defending against all the men-tioned attacks and practical in incurring low performanceoverhead and memory cost. Besides, it does not affect thesystem stability. R
EFERENCES [1] Apple, Inc. About the security content of mac efi securityupdate 2015-001. https://support.apple.com/en-au/HT204934,August 2015.[2] Zelalem Birhanu Aweke, Salessawi Ferede Yitbarek, RuiQiao, Reetuparna Das, Matthew Hicks, Yossi Oren, and ToddAustin. ANVIL: Software-based protection against next-generation rowhammer attacks. In
Architectural Support forProgramming Languages and Operating Systems , pages 743–755, 2016.[3] Carsten Bock, Ferdinand Brasser, David Gens, ChristopherLiebchen, and Ahamd-Reza Sadeghi. RIP-RH: Preventingrowhammer-based inter-process attacks. In
Asia Conferenceon Computer and Communications Security , pages 561–572,2019. [4] Jeff Bonwick. The slab allocator: An object-caching kernelmemory allocator. In
USENIX summer , 1994.[5] Erik Bosman, Kaveh Razavi, Herbert Bos, and CristianoGiuffrida. Dedup est machina: memory deduplication as anadvanced exploitation vector. In
IEEE Symposium on Securityand Privacy , pages 987–1004, 2016.[6] Ferdinand Brasser, Lucas Davi, David Gens, ChristopherLiebchen, and Ahmad-Reza Sadeghi. CAn’t Touch This:Software-only mitigation against rowhammer attacks target-ing kernel memory. In
USENIX Security Symposium , 2017.[7] Yueqiang Cheng, Zhi Zhang, Surya Nepal, and Zhi Wang.CATTmew: Defeating software-only physical kernel isolation.
IEEE Transactions on Dependable and Secure Computing ,2019.[8] Lucian Cojocar, Jeremie Kim, Minesh Patel, Lillian Tsai,Stefan Saroiu, Alec Wolman, and Onur Mutlu. Are wesusceptible to rowhammer? an end-to-end methodology forcloud providers. In
IEEE Symposium on Security and Privacy ,May 2020.[9] Lucian Cojocar, Kaveh Razavi, Cristiano Giuffrida, and Her-bert Bos. Exploiting correcting codes: on the effectivenessof ECC memory against rowhammer attacks. In
IEEESymposium on Security and Privacy , pages 55–71, 2019.[10] Jonathan Corbet. Trees ii: red-black trees. https://lwn.net/Articles/184495/, 2006.[11] Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victorvan der Veen, Onur Mutlu, Cristiano Giuffrida, Herbert Bos,and Kaveh Razavi. TRRespass: Exploiting the many sidesof target row refresh. In
IEEE Symposium on Security andPrivacy , 2020.[12] Mohamad Gebai and Michel R Dagenais. Survey and analysisof kernel and userspace tracers on linux: Design, implemen-tation, and overhead.
ACM Computing Surveys , pages 1–33,2018.[13] Daniel Gruss, Moritz Lipp, Michael Schwarz, Daniel Genkin,Jonas Juffinger, Sioli O’Connell, Wolfgang Schoechl, andYuval Yarom. Another flip in the wall of rowhammerdefenses. In
IEEE Symposium on Security and Privacy , pages245–261, 2018.[14] Daniel Gruss, Cl´ementine Maurice, and Stefan Mangard.Program for testing for the DRAM rowhammer problem usingeviction. https://github.com/IAIK/rowhammerjs, May 2017.[15] Hasan Hassan, Nandita Vijaykumar, Samira Khan, SaugataGhose, Kevin Chang, Gennady Pekhimenko, Donghyuk Lee,Oguz Ergin, and Onur Mutlu. Softmc: A flexible and practicalopen-source infrastructure for enabling experimental dramstudies. In
High Performance Computer Architecture
InternationalSymposium on Computer Architecture , 2020.[22] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji HyeLee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and OnurMutlu. Flipping bits in memory without accessing them:an experimental study of DRAM disturbance errors. In
International Symposium on Computer Architecture , page361–372, 2014.[23] Radhesh Krishnan Konoth, Marco Oliverio, Andrei Tatar,Dennis Andriesse, Herbert Bos, Cristiano Giuffrida, andKaveh Razavi. ZebRAM: comprehensive and compatiblesoftware protection against rowhammer attacks. In
OperatingSystems Design and Implementation , pages 697–710, 2018.[24] Anil Kurmus, Sergej Dechand, and R¨udiger Kapitza. Quantifi-able run-time kernel attack surface reduction. In
InternationalConference on Detection of Intrusions and Malware, andVulnerability Assessment , pages 212–234, 2014.[25] Anil Kurmus, Alessandro Sorniotti, and R¨udiger Kapitza.Attack surface reduction for commodity os kernels: trimmedgarden plants may attract less bugs. In
Proceedings of theFourth European Workshop on System Security , pages 1–6,2011.[26] Andrew Kwong, Daniel Genkin, Daniel Gruss, and YuvalYarom. RAMBleed: Reading bits in memory without ac-cessing them. In
IEEE Symposium on Security and Privacy ,2020.[27] Eojin Lee, Ingab Kang, Sukhan Lee, G Edward Suh, andJung Ho Ahn. TWiCe: preventing row-hammering by ex-ploiting time window counters. In
International Symposiumon Computer Architecture
USENIX Security Symposium , 2007.[31] Peter Pessl, Daniel Gruss, Cl´ementine Maurice, MichaelSchwarz, and Stefan Mangard. DRAMA: Exploiting DRAMaddressing for cross-CPU attacks. In
USENIX SecuritySymposium , pages 565–581, 2016. [32] Mark Seaborn. How physical addresses map to rows andbanks in dram. http://lackingrhoticity.blogspot.com.au/2015/05/how-physical-addresses-map-to-rows-and-banks.html,2015.[33] Mark Seaborn and Thomas Dullien. Exploiting the DRAMrowhammer bug to gain kernel privileges. In
Black Hat’15 ,2015.[34] Seyed Mohammad Seyedzadeh, Alex K Jones, and RamiMelhem. Counter-based tree structure for row hammeringmitigation in DRAM.
IEEE Computer Architecture Letters ,16(1):18–21, 2016.[35] Seyed Mohammad Seyedzadeh, Alex K Jones, and RamiMelhem. Mitigating wordline crosstalk using adaptive treesof counters. In
International Symposium on Computer Archi-tecture , pages 612–623, 2018.[36] Mungyu Son, Hyunsun Park, Junwhan Ahn, and SungjooYoo. Making DRAM stronger against row hammering. In
Design Automation Conference
USENIX Annual Technical Conference , 2018.[40] Victor van der Veen, Yanick Fratantonio, Martina Lindorfer,Daniel Gruss, Cl´ementine Maurice, Giovanni Vigna, HerbertBos, Kaveh Razavi, and Cristiano Giuffrida. Drammer:Deterministic rowhammer attacks on mobile platforms. In
ACM SIGSAC Conference on Computer and CommunicationsSecurity , pages 1675–1689, 2016.[41] Victor van der Veen, Martina Lindorfer, Yanick Fratantonio,Harikrishnan Padmanabha Pillai, Giovanni Vigna, ChristopherKruegel, Herbert Bos, and Kaveh Razavi. Guardion: Practicalmitigation of dma-based rowhammer attacks on arm. In
Inter-national Conference on Detection of Intrusions and Malware,and Vulnerability Assessment , pages 92–113. Springer, 2018.[42] Minghua Wang, Zhi Zhang, Yueqiang Cheng, and SuryaNepal. Dramdig: A knowledge-assisted tool to uncover dramaddress mapping. In
Design Automation Conference , 2020.[43] Xin-Chuan Wu, Timothy Sherwood, Frederic T. Chong, andYanjing Li. Protecting page tables from rowhammer attacksusing monotonic pointers in DRAM true-cells. In
Architec-tural Support for Programming Languages and OperatingSystems , pages 645–657, 2019.[44] xenbits.xen.org. source code (page.h). http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;hb=refs/heads/stable-4.3;f=xen/include/asm-x86/x86 64/page.h, 2009.[45] Yuan Xiao, Xiaokuan Zhang, Yinqian Zhang, and RaduTeodorescu. One bit flips, one cloud flops: Cross-VM rowhammer attacks and privilege escalation. In
USENIX SecuritySymposium , pages 19–35, 2016.46] Zhenkai Zhang, Zihao Zhan, Daniel Balasubramanian, Bo Li,Peter Volgyesi, and Xenofon Koutsoukos. Leveraging EMside-channel information to detect rowhammer attacks. In
IEEE Symposium on Security and Privacy , 2020.[47] Zhi Zhang, Yueqiang Cheng, Dongxi Liu, Surya Nepal, ZhiWang, and Yuval Yarom. Pthammer: Cross-user-kernel-boundary rowhammer through implicit accesses. In