Razya Ladelsky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Razya Ladelsky is active.

Explore More

Publication

Featured researches published by Razya Ladelsky.

International Journal of Parallel Programming | 2011

ACOTES project: Advanced compiler technologies for embedded streaming

Eduard Ayguadé; Cédric Bastoul; Paul M. Carpenter; Zbigniew Chamski; Albert Cohen; Marco Cornero; Philippe Dumont; Marc Duranton; Mohammed Fellahi; Roger Ferrer; Razya Ladelsky; Menno Lindwer; Xavier Martorell; Cupertino Miranda; Dorit Nuzman; Andrea Ornstein; Antoniu Pop; Sebastian Pop; Louis-Noël Pouchet; Alex Ramirez; David Ródenas; Erven Rohou; Ira Rosen; Uzi Shvadron; Konrad Trifunovic; Ayal Zaks

Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.

international conference on parallel processing | 2012

Parallelizing more Loops with Compiler Guided Refactoring

Per Larsen; Razya Ladelsky; Jacob Lidman; Sally A. McKee; Sven Karlsson; Ayal Zaks

The performance of many parallel applications relies not on instruction-level parallelism but on loop-level parallelism. Unfortunately, automatic parallelization of loops is a fragile process, many different obstacles affect or prevent it in practice. To address this predicament we developed an interactive compilation feedback system that guides programmers in iteratively modifying their application source code. This helps leverage the compilers ability to generate loop-parallel code. We employ our system to modify two sequential benchmarks dealing with image processing and edge detection, resulting in scalable parallelized code that runs up to 8.3 times faster on an eight-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should be combined with target-specific optimizations. Furthermore, comparing the first benchmark to manually-parallelized, hand-optimized pthreads and OpenMP versions, we find that code generated using our approach typically outperforms the pthreads code (within 93-339%). It also performs competitively against the OpenMP code (within 75-111%). The second benchmark outperforms manually-parallelized and optimized OpenMP code (within 109-242%).

acm international conference on systems and storage | 2017

Zero-copy receive path in virtio

Kalman Z. Meth; Mike Rapoport; Joel Nider; Razya Ladelsky

In the KVM hypervisor, incoming packets from the network must pass through several objects in the Linux kernel before being delivered to the guest VM. Currently, both the hypervisor and the guest keep their own sets of buffers on the receive path. For large packets, the overall processing time is dominated by the copying of data from hypervisor buffers to guest buffers.

acm international conference on systems and storage | 2016

IO Core Manager for Virtual Environments

Eyal Moscovici; Dan Tsafrir; Yossi Kuperman; Joel Nider; Razya Ladelsky; Abel Gordon

Para-virtualization is the leading approach in IO device virtualization. It allows the hypervisor to interpose on and inspect a virtual machines I/O traffic at run-time. Examples of such interfaces are KVMs virtio [6] and VMWares VMXNET [7]. Current implementations of virtual I/O in the hypervisor have been shown to have performance and scalability limitations [2, 3, 5]. The overhead incurred during interposition arises from two main sources: VM exits and thread scheduling. VM exits are caused when the virtual machine requires some intervention of the hypervisor in order to continue execution. VM exits are required to perform I/O tasks since the VM does not have direct access to I/O hardware [1]. The second source of overhead is the hypervisors thread scheduler, which is not aware of the type of work being performed by a particular thread. When executing I/O threads have work (i.e. I/O traffic to process), the scheduler schedules the thread without regard to the latency or throughput requirements of the virtual device. In workloads with a small amount of latency-sensitive traffic, the thread context switches can become prohibitively costly. The limitations can be somewhat mitigated by using the side-core [4] approach, which divides the system cores into two distinct sets: one for running VM guests, and the other dedicated to virtual I/O processing. However, the number of cores that should be assigned to each set is dependent on the constantly changing workload. For optimum performance, the resources must be allocated according to measurements taken at runtime. We present IOcm which is able to tune the system automatically for using the side-core approach. IOcm provides a better foundation for building practical systems using the side-core approach by improving its usability. IOcm includes mechanisms that expose statistics and controls that allow for better management of the system. We show that IOcm is able to provide comparable performance to a side-core system tuned by an oracle.

Archive | 2011