Is this you? Create Your Porfile

David Camp

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Camp is active.

Explore More

Publication

Featured researches published by David Camp.

conference on high performance computing (supercomputing) | 2000

High performance visualization of time-varying volume data over a wide-area network status

Kwan-Liu Ma; David Camp

This paper presents an end-to-end, low-cost solution for visualizing time-varying volume data rendered on a parallel computer located at a remote site. Pipelining and careful grouping of processors are used to hide I/O time and to maximize processors utilization. Compression is used to significantly cut down the cost of transferring output images from the parallel computer to a display device through a widearea network. This complete rendering pipeline makes possible highly efficient rendering and remote viewing of high resolution time-varying data sets in the absence of high-speed network and parallel I/O support. To study the performance of this rendering pipeline and to demonstrate high-performance remote visualization, tests were conducted on a PC cluster in Japan as well as an SGI Origin 2000 operated at the NASA Ames Research Center with the display located at UC Davis.

IEEE Transactions on Visualization and Computer Graphics | 2011

Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture

David Camp; Christoph Garth; Hank Childs; David Pugmire; Kenneth I. Joy

Streamline computation in a very large vector field data set represents a significant challenge due to the nonlocal and data-dependent nature of streamline integration. In this paper, we conduct a study of the performance characteristics of hybrid parallel programming and execution as applied to streamline integration on a large, multicore platform. With multicore processors now prevalent in clusters and supercomputers, there is a need to understand the impact of these hybrid systems in order to make the best implementation choice. We use two MPI-based distribution approaches based on established parallelization paradigms, parallelize over seeds and parallelize over blocks, and present a novel MPI-hybrid algorithm for each approach to compute streamlines. Our findings indicate that the work sharing between cores in the proposed MPI-hybrid parallel implementation results in much improved performance and consumes less communication and I/O bandwidth than a traditional, nonhybrid distributed implementation.

ieee symposium on large data analysis and visualization | 2014

Improved post hoc flow analysis via Lagrangian representations

Alexy Agranovsky; David Camp; Christoph Garth; E. Wes Bethel; Kenneth I. Joy; Hank Childs

Fluid mechanics considers two frames of reference for an observer watching a flow field: Eulerian and Lagrangian. The former is the frame of reference traditionally used for flow analysis, and involves extracting particle trajectories based on a vector field. With this work, we explore the opportunities that arise when considering these trajectories from the Lagrangian frame of reference. Specifically, we consider a form where flows are extracted in situ and then used for subsequent post hoc analysis. We believe this alternate, Lagrangian-based form will be increasingly useful, because the Eulerian frame of reference is sensitive to temporal frequency, and architectural trends are causing temporal frequency to drop rapidly on modern supercomputers. We support our viewpoint by running a series of experiments, which demonstrate the Lagrangian form can be more accurate, require less I/O, and be faster when compared to traditional advection.

ieee symposium on large data analysis and visualization | 2011

Evaluating the benefits of an extended memory hierarchy for parallel streamline algorithms

David Camp; Hank Childs; Amit Chourasia; Christoph Garth; Kenneth I. Joy

The increasing cost of achieving sufficient I/O bandwidth for high end supercomputers is leading to architectural evolutions in the I/O subsystem space. Currently popular designs create a staging area on each compute node for data output via solid state drives (SSDs), local hard drives, or both. In this paper, we investigate whether these extensions to the memory hierarchy, primarily intended for computer simulations that produce data, can also benefit visualization and analysis programs that consume data. Some algorithms, such as those that read the data only once and store the data in primary memory, can not draw obvious benefit from the presence of a deeper memory hierarchy. However, algorithms that read data repeatedly from disk are excellent candidates, since the repeated reads can be accelerated by caching the first read of a block on the new resources (i.e. SSDs or hard drives). We study such an algorithm, streamline computation, and quantify the benefits it can derive.

eurographics workshop on parallel graphics and visualization | 2013

GPU acceleration of particle advection workloads in a parallel, distributed memory setting

David Camp; Harinarayan Krishnan; David Pugmire; Christoph Garth; Ian Johnson; E. Wes Bethel; Kenneth I. Joy; Hank Childs

Although there has been significant research in GPU acceleration, both of parallel simulation codes (i.e., GPGPU) and of single GPU visualization and analysis algorithms, there has been relatively little research devoted to visualization and analysis algorithms on GPU clusters. This oversight is significant: parallel visualization and analysis algorithms have markedly different characteristics -- computational load, memory access pattern, communication, idle time, etc. -- than the other two categories. In this paper, we explore the benefits of GPU acceleration for particle advection in a parallel, distributed-memory setting. As performance properties can differ dramatically between particle advection use cases, our study operates over a variety of workloads, designed to reveal insights about underlying trends. This work has a three-fold aim: (1) to map a challenging visualization and analysis algorithm -- particle advection -- to a complex system (a cluster of GPUs), (2) to inform its performance characteristics, and (3) to evaluate the advantages and disadvantages of using the GPU. In our performance study, we identify which factors are and are not relevant for obtaining a speedup when using GPUs. In short, this study informs the following question: if faced with a parallel particle advection problem, should you implement the solution with CPUs, with GPUs, or does it not matter?

ieee symposium on large data analysis and visualization | 2012

Parallel stream surface computation for large data sets

David Camp; Hank Childs; Christoph Garth; David Pugmire; Kenneth I. Joy

Parallel stream surface calculation, while highly related to other particle advection-based techniques such as streamlines, has its own unique characteristics that merit independent study. Specifically, stream surfaces require new integral curves to be added continuously during execution to ensure surface quality and accuracy; performance can be improved by specifically accounting for these additional particles. We present an algorithm for generating stream surfaces in a distributed-memory parallel setting. The algorithm incorporates multiple schemes for parallelizing particle advection and we study which schemes work best. Further, we explore speculative calculation and how it can improve overall performance. In total, this study informs the efficient calculation of stream surfaces in parallel for large data sets, based on existing integral curve functionality.

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV) | 2013

Distributed parallel particle advection using work requesting

Cornelius Müller; David Camp; Bernd Hentschel; Christoph Garth

Particle advection is an important vector field visualization technique that is difficult to apply to very large data sets in a distributed setting due to scalability limitations in existing algorithms. In this paper, we report on several experiments using work requesting dynamic scheduling which achieves balanced work distribution on arbitrary problems with minimal communication overhead. We present a corresponding prototype implementation, provide and analyze benchmark results, and compare our results to an existing algorithm.

visualization and data analysis | 2015

Subsampling-based compression and flow visualization

Alexy Agranovsky; David Camp; Kenneth I. Joy; Hank Childs

As computational capabilities increasingly outpace disk speeds on leading supercomputers, scientists will, in turn, be increasingly unable to save their simulation data at its native resolution. One solution to this problem is to compress these data sets as they are generated and visualize the compressed results afterwards. We explore this approach, specifically subsampling velocity data and the resulting errors for particle advection-based flow visualization. We compare three techniques: random selection of subsamples, selection at regular locations corresponding to multi-resolution reduction, and introduce a novel technique for informed selection of subsamples. Furthermore, we explore an adaptive system which exchanges the subsampling budget over parallel tasks, to ensure that subsampling occurs at the highest rate in the areas that need it most. We perform supercomputing runs to measure the effectiveness of the selection and adaptation techniques. Overall, we find that adaptation is very effective, and, among selection techniques, our informed selection provides the most accurate results, followed by the multi-resolution selection, and with the worst accuracy coming from random subsamples.

international parallel and distributed processing symposium | 2015

Improving Performance of Structured-Memory, Data-Intensive Applications on Multi-core Platforms via a Space-Filling Curve Memory Layout

E. Wes Bethel; David Camp; David Donofrio; Mark Howison

Many data-intensive algorithms -- particularly in visualization, image processing, and data analysis -- operate on structured data, that is, data organized in multidimensional arrays. While many of these algorithms are quite numerically intensive, by and large, their performance is limited by the cost of memory accesses. As we move towards the exascale regime of computing, one central research challenge is finding ways to minimize data movement through the memory hierarchy, particularly within a node in a shared-memory parallel setting. We study the effects that an alternative in-memory data layout format has in terms of runtime performance gains resulting from reducing the amount of data moved through the memory hierarchy. We focus the study on shared-memory parallel implementations of two algorithms common in visualization and analysis: a stencil-based convolution kernel, which uses a structured memory access pattern, and ray casting volume rendering, which uses a semi-structured memory access pattern. The question we study is to better understand to what degree an alternative memory layout, when used by these key algorithms, will result in improved runtime performance and memory system utilization. Our approach uses a layout based on a Z-order (Morton-order) space-filling curve data organization, and we measure and report runtime and various metrics and counters associated with memory system utilization. Our results show nearly uniform improved runtime performance and improved utilization of the memory hierarchy across varying levels of concurrency the applications we tested. This approach is complementary to other memory optimization strategies like cache blocking, but may also be more general and widely applicable to a diverse set of applications.

ieee international conference on high performance computing, data, and analytics | 2014

Particle advection performance over varied architectures and workloads

Hank Childs; Scott Biersdorff; David Poliakoff; David Camp; Allen D. Malony

Particle advection is a foundational operation for many flow visualization techniques, including streamlines, Finite-Time Lyapunov Exponents (FTLE) calculation, and stream surfaces. The workload for particle advection problems varies greatly, including significant variation in computational requirements. With this study, we consider the performance impacts from hardware architecture on this problem, studying distributed-memory systems with CPUs with varying amounts of cores per node, and with nodes with one to three GPUs. Our goal was to explore which architectures were best suited to which workloads, and why. While the results of this study will help inform visualization scientists which architectures they should use when solving certain flow visualization problems, it is also informative for the larger HPC community, since many simulation codes will soon incorporate visualization via in situ techniques.

Explore More