Is this you? Create Your Porfile

Leonid Oliker

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Leonid Oliker is active.

Explore More

Publication

Featured researches published by Leonid Oliker.

ieee international conference on high performance computing data and analytics | 2008

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Kaushik Datta; Mark Murphy; Vasily Volkov; Samuel Williams; Jonathan Carter; Leonid Oliker; David A. Patterson; John Shalf; Katherine A. Yelick

Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations --- a class of algorithms at the heart of many structured grid codes, including PDF solvers. We develop a number of effective optimization strategies, and build an auto-tuning environment that searches over our optimizations and their parameters to minimize runtime, while maximizing performance portability. To evaluate the effectiveness of these strategies we explore the broadest set of multicore architectures in the current HPC literature, including the Intel Clovertown, AMD Barcelona, Sun Victoria Falls, IBM QS22 PowerXCell 8i, and NVIDIA GTX280. Overall, our auto-tuning optimization methodology results in the fastest multicore stencil performance to date. Finally, we present several key insights into the architectural tradeoffs of emerging multicore designs and their implications on scientific algorithm development.

international parallel and distributed processing symposium | 2002

Memory-intensive benchmarks: IRAM vs. cache-based machines

Brian R. Gaeke; Parry Husbands; Xiaoye S. Li; Leonid Oliker; Katherine A. Yelick; Rupak Biswas

The increasing gap between processor and memory performance has led to new architectural models for memory-intensive applications. In this paper, we use a set of memory-intensive benchmarks to evaluate a mixed logic and DRAM processor called VIRAM as a building block for scientific computing. For each benchmark, we explore the fundamental hardware requirements of the problem as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. Results indicate that VIRAM is significantly faster than conventional cache-based machines for problems that are truly limited by the memory system and that it has a significant power advantage across all the benchmarks.

Lawrence Berkeley National Laboratory | 2008

PERI - Auto-tuning Memory Intensive Kernels for Multicore

Samuel Williams; Kaushik Datta; Jonathan Carter; Leonid Oliker; John Shalf; Katherine A. Yelick; David H. Bailey

We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to Sparse Matrix Vector Multiplication (SpMV), the explicit heat equation PDE on a regular grid (Stencil), and a lattice Boltzmann application (LBMHD). We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Xeon Clovertown, AMD Opteron Barcelona, Sun Victoria Falls, and the Sony-Toshiba-IBM (STI) Cell. Rather than hand-tuning each kernel for each system, we develop a code generator for each kernel that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned kernel applications often achieve a better than 4X improvement compared with the original code. Additionally, we analyze a Roofline performance model for each platform to reveal hardware bottlenecks and software challenges for future multicore systems and applications.

automation, robotics and control systems | 2009

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture

Joseph James Gebis; Leonid Oliker; John Shalf; Samuel Williams; Katherine A. Yelick

The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changes to the core design and could thus be easily integrated with conventional processor cores. To validate our approach, we implemented ViVA on the Mambo cycle-accurate full system simulator, which was carefully calibrated to match the performance on our underlying PowerPC Apple G5 architecture. Results show that ViVA is able to deliver significant performance benefits over scalar techniques for a variety of memory access patterns as well as two important memory-bound compact kernels, corner turn and sparse matrix-vector multiplication -- achieving 2x---13x improvement compared the scalar version. Overall, our preliminary ViVA exploration points to a promising approach for improving application performance on leading microprocessors with minimal design and complexity costs, in a power efficient manner.

ieee hot chips symposium | 2008

The roofline model: A pedagogical tool for program analysis and optimization

Samuel Williams; David A. Patterson; Leonid Oliker; John Shalf; Katherine A. Yelick

This article consists of a collection of slides from the authors conference presentation. The Roofline model is a visually intuitive figure for kernel analysis and optimization. The authors believe undergraduates will find it useful in assessing performance and scalability limitations. It is easily extended to other architectural paradigms. It is easily extendable to other metrics: performance (sort, graphics, crypto...) bandwidth (L2, PCIe, ...). A performance counters could be used to generate a runtime-specific roofline that would greatly aide the optimization.

Archive | 2009