Is this you? Create Your Porfile

Amro Awad

North Carolina State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amro Awad is active.

Explore More

Publication

Featured researches published by Amro Awad.

international conference on supercomputing | 2016

Write-Aware Management of NVM-based Memory Extensions

Amro Awad; Sergey Blagodurov; Yan Solihin

Emerging Non-Volatile Memory (NVM) technologies, such as 3D XPoint, are expected to be in production as early as 2016. Emerging NVMs are very attractive for several reasons. First, they are non-volatile and hence incur no refresh power. Second, they are dense and promising for scaling down further. Finally, they are fast and have latencies comparable to DRAM. On the other side, using emerging NVMs as direct replacement for DRAM as the main memory is challenging. Compared to DRAM, emerging NVMs can endure a very limited number of writes per cell. Furthermore, their write latency is typically much slower and more energy consuming than DRAM, e.g., Phase Change Memory (PCM) writes are multiple of times slower than that of DRAM. An important use case for emerging NVMs is using them as fast memory extensions. Memory extensions are hidden from programmers and managed by the Operating System (OS). Any access to pages held in the memory extension will cause a page fault. Later, the memory manager moves the faulting page to DRAM and maps the page. While similar in concept to the swap file, memory extensions bypass the file system. Furthermore, memory extensions are dedicated for being used as memory and hence avoid contention with the file system. In this paper, we emulate an NVM-based memory extension and study its impact on performance on a real system. We also study how to improve its performance using OS-level prefetching. We show the importance of having the system software and the NVM controller work in concert for reducing the number of writes. Our best scheme where the system software and the NVM controller work in concert could reduce the number of writes to only 5% of the original baseline (increasing its lifetime by 20x).

international symposium on computer architecture | 2017

ObfusMem: A Low-Overhead Access Obfuscation for Trusted Memories

Amro Awad; Yipeng Wang; Deborah Shands; Yan Solihin

Trustworthy software requires strong privacy and security guarantees from a secure trust base in hardware. While chipmakers provide hardware support for basic security and privacy primitives such as enclaves and memory encryption. these primitives do not address hiding of the memory access pattern, information about which may enable attacks on the system or reveal characteristics of sensitive user data. State-of-the-art approaches to protecting the access pattern are largely based on Oblivious RAM (ORAM). Unfortunately, current ORAM implementations suffer from very significant practicality and overhead concerns, including roughly an order of magnitude slowdown, more than 100% memory capacity overheads, and the potential for system deadlock. Memory technology trends are moving towards 3D and 2.5D integration, enabling significant logic capabilities and sophisticated memory interfaces. Leveraging the trends, we propose a new approach to access pattern obfuscation, called ObfusMem. ObfusMem adds the memory to the trusted computing base and incorporates cryptographic engines within the memory. ObfusMem encrypts commands and addresses on the memory bus, hence the access pattern is cryptographically obfuscated from external observers. Our evaluation shows that ObfusMem incurs an overhead of 10.9% on average, which is about an order of magnitude faster than ORAM implementations. Furthermore, ObfusMem does not incur capacity overheads and does not amplify writes. We analyze and compare the security protections provided by ObfusMem and ORAM, and highlight their differences.

international symposium on performance analysis of systems and software | 2015

Non-volatile memory host controller interface performance analysis in high-performance I/O systems

Amro Awad; Brett Kettering; Yan Solihin

Emerging non-volatile memories (NVMs), such as Phase-Change Memory (PCM), Spin-Transfer Torque RAM (STT-RAM) and Memristor, are very promising candidates for replacing NAND-Flash Solid-State Drives (SSDs) and Hard Disk Drives (HDDs) for many reasons. First, their read/write latencies are orders of magnitude faster. Second, some emerging NVMs, such as memristors, are expected to have very high densities, which allow deploying a much higher capacity without requiring increased physical space. While the percentage of the time taken for data movement over low-speed buses, such as Peripheral Component Interconnect (PCI), is negligible for the overall read/write latency in HDDs, it could be dominant for emerging fast NVMs. Therefore, the trend has moved toward using very fast interconnect technologies, such as PCI Express (PCIe) which is hundreds of times faster than the traditional PCI. Accordingly, new host controller interfaces are used to communicate with I/O devices to exploit the parallelism and low-latency features of emerging NVMs through high-speed interconnects. In this paper, we investigate the system performance bottlenecks and overhead of using the standard state-of-the-art Non-Volatile Memory Express (NVMe), or Non-Volatile Memory Host Controller Interface (NVMHCI) Specification [1] as representative for NVM host controller interfaces.

international conference on parallel architectures and compilation techniques | 2017

Avoiding TLB Shootdowns Through Self-Invalidating TLB Entries

Amro Awad; Arkaprava Basu; Sergey Blagodurov; Yan Solihin; Gabriel H. Loh

Updates to a processs page table entry (PTE) renders any existing copies of that PTE in any of a systems TLBs stale. To prevent a process from making illegal memory accesses using stale TLB entries, the operating system (OS) performs a costly TLB shootdown operation. Rather than explicitly issuing shootdowns, we propose a coordinated TLB and page table management mechanism where an expirationtime is associated with each TLB entry. An expired TLB entry is treated as invalid. For each PTE, the OS then tracks the latest expiration time of any TLB entry potentially caching that PTE. No shootdown is issued if the OS modifies a PTE when its corresponding latest expiration time has already passed.In this paper, we explain the hardware and OS support required to support Self-invalidating TLB entries (SITE). As an emerging use case that needs fast TLB shootdowns, we consider memory systems consisting of different types of memory (e.g., faster DRAM and slower non-volatile memory) where aggressive migrations are desirable to keep frequently accessed pages in faster memory, but pages cannot migratetoo often because each migration requires a PTE update and corresponding TLB shootdown. We demonstrate that such heterogeneous memory systems augmented with SITE can allow an average performance improvement of 45.5% over a similar system with traditional TLB shootdowns by avoiding more than 65% of the shootdowns.

Ipsj Transactions on System Lsi Design Methodology | 2016

Accurate Cloning of the Memory Access Behavior

Amro Awad; Ganesh Balakrishnan; Yipeng Wang; Yan Solihin

While customizing the memory system design or picking the most fitting design for applications is very critical, many software vendors refrain from releasing their software for several reasons. First, many applications are proprietary, hence releasing them to hardware architects or vendors is not desired. Second, applications such as defense and nuclear simulations are very sensitive, hence accessing them is very restricted. Nonetheless, customizing the hardware for such applications is still important and highly desired. Workload cloning is the technique of generating synthetic clones from the original workload. The clones mimic the memory access behavior of the original workloads, hence enable exploring the design space with high level of accuracy. In this article, we survey the state-of-art cloning techniques of the memory access behavior and their uses.

high-performance computer architecture | 2014