Is this you? Create Your Porfile

Nadav Amit

Technion – Israel Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nadav Amit is active.

Explore More

Publication

Featured researches published by Nadav Amit.

architectural support for programming languages and operating systems | 2012

ELI: bare-metal performance for I/O virtualization

Abel Gordon; Nadav Amit; Nadav Har'El; Muli Ben-Yehuda; Alex Landau; Assaf Schuster; Dan Tsafrir

Direct device assignment enhances the performance of guest virtual machines by allowing them to communicate with I/O devices without host involvement. But even with device assignment, guests are still unable to approach bare-metal performance, because the host intercepts all interrupts, including those interrupts generated by assigned devices to signal to guests the completion of their I/O requests. The host involvement induces multiple unwarranted guest/host context switches, which significantly hamper the performance of I/O intensive workloads. To solve this problem, we present ELI (ExitLess Interrupts), a software-only approach for handling interrupts within guest virtual machines directly and securely. By removing the host from the interrupt handling path, ELI manages to improve the throughput and latency of unmodified, untrusted guests by 1.3x-1.6x, allowing them to reach 97%-100% of bare-metal performance even for the most demanding I/O-intensive workloads.

international symposium on computer architecture | 2010

IOMMU: strategies for mitigating the IOTLB bottleneck

Nadav Amit; Muli Ben-Yehuda; Ben-Ami Yassour

The input/output memory management unit (IOMMU) was recently introduced into mainstream computer architecture when both Intel and AMD added IOMMUs to their chip-sets. An IOMMU provides memory protection from I/O devices by enabling system software to control which areas of physical memory an I/O device may access. However, this protection incurs additional direct memory access (DMA) overhead due to the required address resolution and validation. IOMMUs include an input/output translation lookaside buffer (IOTLB) to speed-up address resolution, but still every IOTLB cache-miss causes a substantial increase in DMA latency and performance degradation of DMA-intensive workloads. In this paper we first demonstrate the potential negative impact of IOTLB cache-misses on workload performance. We then propose both system software and hardware enhancements to reduce IOTLB miss rate and accelerate address resolution. These enhancements can lead to a reduction of over 60% in IOTLB miss-rate for common I/O intensive workloads.

architectural support for programming languages and operating systems | 2015

rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers

Moshe Malka; Nadav Amit; Muli Ben-Yehuda; Dan Tsafrir

The IOMMU allows the OS to encapsulate I/O devices in their own virtual memory spaces, thus restricting their DMAs to specific memory pages. The OS uses the IOMMU to protect itself against buggy drivers and malicious/errant devices. But the added protection comes at a cost, degrading the throughput of I/O-intensive workloads by up to an order of magnitude. This cost has motivated system designers to trade off some safety for performance, e.g., by leaving stale information in the IOTLB for a while so as to amortize costly invalidations. We observe that high-bandwidth devices---like network and PCIe SSD controllers---interact with the OS via circular ring buffers that induce a sequential, predictable workload. We design a ring IOMMU (rIOMMU) that leverages this characteristic by replacing the virtual memory page table hierarchy with a circular, flat table. A flat table is adequately supported by exactly one IOTLB entry, making every new translation an implicit invalidation of the former and thus requiring explicit invalidations only at the end of I/O bursts. Using standard networking benchmarks, we show that rIOMMU provides up to 7.56x higher throughput relative to the baseline IOMMU, and that it is within 0.77--1.00x the throughput of a system without IOMMU protection.

symposium on operating systems principles | 2015

Virtual CPU validation

Nadav Amit; Dan Tsafrir; Assaf Schuster; Ahmad Ayoub; Eran Shlomo

Testing the hypervisor is important for ensuring the correct operation and security of systems, but it is a hard and challenging task. We observe, however, that the challenge is similar in many respects to that of testing real CPUs. We thus propose to apply the testing environment of CPU vendors to hypervisors. We demonstrate the advantages of our proposal by adapting Intels testing facility to the Linux KVM hypervisor. We uncover and fix 117 bugs, six of which are security vulnerabilities. We further find four flaws in Intel virtualization technology, causing a disparity between the observable behavior of code running on virtual and bare-metal servers.

Communications of The ACM | 2015

Bare-metal performance for virtual machines with exitless interrupts

Nadav Amit; Abel Gordon; Nadav Har'El; Muli Ben-Yehuda; Alex Landau; Assaf Schuster; Dan Tsafrir

Direct device assignment enhances the performance of guest virtual machines by allowing them to communicate with I/O devices without host involvement. But even with device assignment, guests are still unable to approach bare-metal performance, because the host intercepts all interrupts, including those generated by assigned devices to signal to guests the completion of their I/O requests. The host involvement induces multiple unwarranted guest/host context switches, which significantly hamper the performance of I/O intensive workloads. To solve this problem, we present ExitLess Interrupts (ELI), a software-only approach for handling interrupts within guest virtual machines directly and securely. By removing the host from the interrupt handling path, ELI improves the throughput and latency of unmodified, untrusted guests by 1.3×–1.6×, allowing them to reach 97–100% of bare-metal performance even for the most demanding I/O-intensive workloads.

symposium on cloud computing | 2017

Remote memory in the age of fast networks

Marcos Kawazoe Aguilera; Nadav Amit; Irina Calciu; Xavier Deguillard; Jayneel Gandhi; Pratap Subrahmanyam; Lalith Suresh; Kiran Tati; Rajesh Venkatasubramanian; Michael Wei

As the latency of the network approaches that of memory, it becomes increasingly attractive for applications to use remote memory---random-access memory at another computer that is accessed using the virtual memory subsystem. This is an old idea whose time has come, in the age of fast networks. To work effectively, remote memory must address many technical challenges. In this paper, we enumerate these challenges, discuss their feasibility, explain how some of them are addressed by recent work, and indicate other promising ways to tackle them. Some challenges remain as open problems, while others deserve more study. In this paper, we hope to provide a broad research agenda around this topic, by proposing more problems than solutions.

architectural support for programming languages and operating systems | 2014