Jeremy S. Meredith | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeremy S. Meredith is active.

Explore More

Publication

Featured researches published by Jeremy S. Meredith.

architectural support for programming languages and operating systems | 2010

The Scalable Heterogeneous Computing (SHOC) benchmark suite

Anthony Danalis; Gabriel Marin; Collin McCurdy; Jeremy S. Meredith; Philip C. Roth; Kyle Spafford; Vinod Tipparaju; Jeffrey S. Vetter

Scalable heterogeneous computing systems, which are composed of a mix of compute devices, such as commodity multicore processors, graphics processors, reconfigurable processors, and others, are gaining attention as one approach to continuing performance improvement while managing the new challenge of energy efficiency. As these systems become more common, it is important to be able to compare and contrast architectural designs and programming systems in a fair and open forum. To this end, we have designed the Scalable HeterOgeneous Computing benchmark suite (SHOC). SHOCs initial focus is on systems containing graphics processing units (GPUs) and multi-core processors, and on the new OpenCL programming standard. SHOC is a spectrum of programs that test the performance and stability of these scalable heterogeneous computing systems. At the lowest level, SHOC uses microbenchmarks to assess architectural features of the system. At higher levels, SHOC uses application kernels to determine system-wide performance including many system features such as intranode and internode communication among devices. SHOC includes benchmark implementations in both OpenCL and CUDA in order to provide a comparison of these programming models.

ieee visualization | 2005

A contract based system for large data visualization

Hank Childs; Eric Brugger; Kathleen S. Bonnell; Jeremy S. Meredith; Mark C. Miller; Brad Whitlock; Nelson L. Max

VisIt is a richly featured visualization tool that is used to visualize some of the largest simulations ever run. The scale of these simulations requires that optimizations are incorporated into every operation VisIt performs. But the set of applicable optimizations that VisIt can perform is dependent on the types of operations being done. Complicating the issue, VisIt has a plugin capability that allows new, unforeseen components to be added, making it even harder to determine which optimizations can be applied. We introduce the concept of a contract to the standard data flow network design. This contract enables each component of the data flow network to modify the set of optimizations used. In addition, the contract allows for new components to be accommodated gracefully within VisIts data flow network system.

eurographics workshop on parallel graphics and visualization | 2011

Parallel in situ coupling of simulation with a fully featured visualization system

Brad Whitlock; Jean M. Favre; Jeremy S. Meredith

There is a widening gap between compute performance and the ability to store computation results. Complex scientific codes are the most affected since they must save massive files containing meshes and fields for offline analysis. Time and storage costs instead dictate that data analysis and visualization be combined with the simulations themselves, being done in situ so data are transformed to a manageable size before they are stored. Earlier approaches to in situ processing involved combining specific visualization algorithms into the simulation code, limiting flexibility. We introduce a new library which instead allows a fully-featured visualization tool, VisIt, to request data as needed from the simulation and apply visualization algorithms in situ with minimal modification to the application code.

Computing in Science and Engineering | 2011

Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community

Jeffrey S. Vetter; Richard Glassbrook; Jack J. Dongarra; Karsten Schwan; Bruce Loftis; Stephen McNally; Jeremy S. Meredith; James H. Rogers; Philip C. Roth; Kyle Spafford; Sudhakar Yalamanchili

The Keeneland projects goal is to develop and deploy an innovative, GPU-based high-performance computing system for the NSF computational science community.

Lawrence Berkeley National Laboratory | 2009

FastBit: interactively searching massive data

Kesheng Wu; Sean Ahern; Edward W Bethel; Jacqueline H. Chen; Hank Childs; E. Cormier-Michel; Cameron Geddes; Junmin Gu; Hans Hagen; Bernd Hamann; Wendy S. Koegler; Jerome Lauret; Jeremy S. Meredith; Peter Messmer; Ekow J. Otoo; V Perevoztchikov; A. M. Poskanzer; Prabhat; Oliver Rübel; Arie Shoshani; Alexander Sim; Kurt Stockinger; Gunther H. Weber; W. M. Zhang

As scientific instruments and computer simulations produce more and more data, the task of locating the essential information to gain insight becomes increasingly difficult. FastBit is an efficient software tool to address this challenge. In this article, we present a summary of the key underlying technologies, namely bitmap compression, encoding, and binning. Together these techniques enable FastBit to answer structured (SQL) queries orders of magnitude faster than popular database systems. To illustrate how FastBit is used in applications, we present three examples involving a high-energy physics experiment, a combustion simulation, and an accelerator simulation. In each case, FastBit significantly reduces the response time and enables interactive exploration on terabytes of data.

european conference on parallel processing | 2010

Maestro: data orchestration and tuning for OpenCL devices

Kyle Spafford; Jeremy S. Meredith; Jeffrey S. Vetter

As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel programming. OpenCL is an open standard for parallel computing that helps alleviate this difficulty by providing a portable set of abstractions for device memory hierarchies. However, OpenCL requires that the programmer explicitly controls data transfer and device synchronization, two tedious and error-prone tasks. This paper introduces Maestro, an open source library for data orchestration on OpenCL devices. Maestro provides automatic data transfer, task decomposition across multiple devices, and autotuning of dynamic execution parameters for some types of problems.

Proceedings IEEE 2001 Symposium on Parallel and Large-Data Visualization and Graphics (Cat. No.01EX520) | 2001

Multiresolution view-dependent splat based volume rendering of large irregular data

Jeremy S. Meredith; Kwan-Liu Ma

We present techniques for multiresolution approximation and hardware-assisted splat based rendering to achieve interactive volume visualization of large irregular data sets. We examine two methods of generating multiple resolutions of irregular volumetric grids and a data structure supporting the splatting approach for volume rendering. These techniques are implemented in combination with a view-dependent error based resolution selection to maintain accuracy at both low and high zoom levels. In addition, the error tolerance may be adjusted at run time to obtain the desired balance between high frame rates and accurate rendering. Along with an effective way to compute gradients for lighting, we offer an integrated solution for interactive volume rendering of irregular-mesh or meshless data, and we demonstrate our technique on unstructured-grid data sets from aerodynamic flow simulations.

computing frontiers | 2012

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

Kyle Spafford; Jeremy S. Meredith; Seyong Lee; Dong Li; Philip C. Roth; Jeffrey S. Vetter

With the rise of general purpose computing on graphics processing units (GPGPU), the influence from consumer markets can now be seen across the spectrum of computer architectures. In fact, many of the high-ranking Top500 HPC systems now include these accelerators. Traditionally, GPUs have connected to the CPU via the PCIe bus, which has proved to be a significant bottleneck for scalable scientific applications. Now, a trend toward tighter integration between CPU and GPU has removed this bottleneck and unified the memory hierarchy for both CPU and GPU cores. We examine the impact of this trend for high performance scientific computing by investigating AMDs new Fusion Accelerated Processing Unit (APU) as a testbed. In particular, we evaluate the tradeoffs in performance, power consumption, and programmability when comparing this unified memory hierarchy with similar, but discrete GPUs.

general purpose processing on graphics processing units | 2011

Quantifying NUMA and contention effects in multi-GPU systems

Kyle Spafford; Jeremy S. Meredith; Jeffrey S. Vetter

As system architects strive for increased density and power efficiency, the traditional compute node is being augmented with an increasing number of graphics processing units (GPUs). The integration of multiple GPUs per node introduces complex performance phenomena including non-uniform memory access (NUMA) and contention for shared system resources. Utilizing the Keeneland system, this paper quantifies these effects and presents some guidance on programming strategies to maximize performance in multi-GPU environments.

ieee international conference on high performance computing data and analytics | 2008

High performance multivariate visual data exploration for extremely large data

Oliver Rübel; Prabhat; Kesheng Wu; Hank Childs; Jeremy S. Meredith; Cameron Geddes; E. Cormier-Michel; Sean Ahern; Gunther H. Weber; Peter Messmer; Hans Hagen; Bernd Hamann; E. Wes Bethel

One of the central challenges in modern science is the need to quickly derive knowledge and understanding from large, complex collections of data. We present a new approach that deals with this challenge by combining and extending techniques from high performance visual data analysis and scientific data management. This approach is demonstrated within the context of gaining insight from complex, time-varying datasets produced by a laser wakefield accelerator simulation. Our approach leverages histogram-based parallel coordinates for both visual information display as well as a vehicle for guiding a data mining operation. Data extraction and subsetting are implemented with state-of-the-art index/query technology. This approach, while applied here to accelerator science, is generally applicable to a broad set of science applications, and is implemented in a production-quality visual data analysis infrastructure. We conduct a detailed performance analysis and demonstrate good scalability on a distributed memory Cray XT4 system.

Explore More