Matthew Larsen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew Larsen is active.

Explore More

Publication

Featured researches published by Matthew Larsen.

IEEE Computer Graphics and Applications | 2016

VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures

Kenneth Moreland; Christopher M. Sewell; William Usher; Li-Ta Lo; Jeremy S. Meredith; David Pugmire; James Kress; Hendrik A. Schroots; Kwan-Liu Ma; Hank Childs; Matthew Larsen; Chun-Ming Chen; Robert Maynard; Berk Geveci

One of the most critical challenges for high-performance computing (HPC) scientific visualization is execution on massively threaded processors. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Our current production scientific visualization software is not designed for these new types of architectures. To address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.

ieee pacific visualization symposium | 2015

Ray tracing within a data parallel framework

Matthew Larsen; Jeremy S. Meredith; Paul A. Navrátil; Hank Childs

Current architectural trends on supercomputers have dramatic increases in the number of cores and available computational power per die, but this power is increasingly difficult for programmers to harness effectively. High-level language constructs can simplify programming many-core devices, but this ease comes with a potential loss of processing power, particularly for cross-platform constructs. Recently, scientific visualization packages have embraced language constructs centering around data parallelism, with familiar operators such as map, reduce, gather, and scatter. Complete adoption of data parallelism will require that central visualization algorithms be revisited, and expressed in this new paradigm while preserving both functionality and performance. This investment has a large potential payoff: portable performance in software bases that can span over the many architectures that scientific visualization applications run on. With this work, we present a method for ray tracing consisting of entirely of data parallel primitives. Given the extreme computational power on nodes now prevalent on supercomputers, we believe that ray tracing can supplant rasterization as the work-horse graphics solution for scientific visualization. Our ray tracing method is relatively efficient, and we describe its performance with a series of tests, and also compare to leading-edge ray tracers that are optimized for specific platforms. We find that our data parallel approach leads to results that are acceptable for many scientific visualization use cases, with the key benefit of providing a single code base that can run on many architectures.

Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization | 2015

Strawman: A Batch In Situ Visualization and Analysis Infrastructure for Multi-Physics Simulation Codes

Matthew Larsen; Eric Brugger; Hank Childs; Jim Eliot; Kevin S. Griffin; Cyrus Harrison

We present Strawman, a system designed to explore the in situ visualization and analysis needs of simulation code teams planning for multi-physics calculations on exascale architectures. Strawmans design derives from key requirements from a diverse set of simulation code teams, including lightweight usage of shared resources, batch processing, ability to leverage modern architectures, and ease-of-use both for software integration and for usage during simulation runs. We describe the Strawman system, the key technologies it depends on, and our experiences integrating Strawman into three proxy simulations. Our findings show that Strawmans design meets our target requirements, and that some of its concepts may be worthy of integration into our community in situ implementations.

eurographics workshop on parallel graphics and visualization | 2015

Volume rendering via data-parallel primitives

Matthew Larsen; Stephanie Labasan; Paul A. Navrátil; Jeremy S. Meredith; Hank Childs

Supercomputing designs have recently evolved to include architectures beyond the standard CPU. In response, visualization software must be developed in a manner that obviates the need for porting all visualization algorithms to all architectures. Recent research results indicate that building visualization software on a foundation of data-parallel primitives can meet this goal, providing portability over many architectures, and doing it in a performant way. With this work, we introduce an unstructured data volume rendering algorithm which is composed entirely of data-parallel primitives. We compare the algorithm to community standards, and show that the performance we achieve is similar. That is, although our algorithm is hardware-agnostic, we demonstrate that our performance on GPUs is comparable to code that was written for and optimized for the GPU, and our performance on CPUs is comparable to code written for and optimized for the CPU. The main contribution of this work is in realizing the benefits of data-parallel primitives --- portable performance, longevity, and programmability --- for volume rendering. A secondary contribution is in providing further evidence of the merits of the data-parallel primitives approach itself.

ieee international conference on high performance computing data and analytics | 2016

Performance modeling of in situ rendering

Matthew Larsen; Cyrus Harrison; James Kress; David Pugmire; Jeremy S. Meredith; Hank Childs

With the push to exascale, in situ visualization and analysis will continue to play an important role in high performance computing. Tightly coupling in situ visualization with simulations constrains resources for both, and these constraints force a complex balance of trade-offs. A performance model that provides an a priori answer for the cost of using an in situ approach for a given task would assist in managing the trade-offs between simulation and visualization resources. In this work, we present new statistical performance models, based on algorithmic complexity, that accurately predict the run-time cost of a set of representative rendering algorithms, an essential in situ visualization task. To train and validate the models, we conduct a performance study of an MPI+X rendering infrastructure used in situ with three HPC simulation applications. We then explore feasibility issues using the model for selected in situ rendering questions.

Proceedings of the In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visualization | 2017

The ALPINE In Situ Infrastructure: Ascending from the Ashes of Strawman

Matthew Larsen; James P. Ahrens; Utkarsh Ayachit; Eric Brugger; Hank Childs; Berk Geveci; Cyrus Harrison

This paper introduces ALPINE, a flyweight in situ infrastructure. The infrastructure is designed for leading-edge supercomputers, and has support for both distributed-memory and shared-memory parallelism. It can take advantage of computing power on both conventional CPU architectures and on many-core architectures such as NVIDIA GPUs or the Intel Xeon Phi. Further, it has a flexible design that supports for integration of new visualization and analysis routines and libraries. The paper describes ALPINEs interface choices and architecture, and also reports on initial experiments performed using the infrastructure.

eurographics workshop on parallel graphics and visualization | 2017

PaViz: A Power-Adaptive Framework for Optimizing Visualization Performance.

Stephanie Labasan; Matthew Larsen; Hank Childs; Barry Rountree

Power consumption is widely regarded as one of the biggest challenges to reaching the next generation of high-performance computing. One strategy for achieving an exaflop given limited power is hardware overprovisioning. In this model, the theoretical peak power usage of the system is greater than the maximum allowable power usage, and a central manager keeps the aggregate power usage at the maximum by enforcing power caps on each node in the system. For this model to be effective, the central manager must be able to make informed trade-offs between power usage and performance. With this work, we introduce PaViz, a software framework designed to optimize the distribution of power for visualization algorithms, which have different characteristics than simulation codes. In this study, we focus specifically on rendering. Our strategy uses a performance model, where nodes predicted to have a small amount of work are allocated less power, and nodes predicted to have a large amount of work are allocated more power. This approach increases the likelihood of all nodes finishing at the same time, which is optimal for power efficiency. At best, our adaptive strategy achieves up to 33% speedup over the traditional strategy, while using the same total power.

Proceedings of the In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visualization | 2017

Performance Impacts of In Situ Wavelet Compression on Scientific Simulations

Shaomeng Li; Matthew Larsen; John Clyne; Hank Childs

In situ compression is a compromise between traditional post hoc and emerging in situ visualization and analysis. While the merits and limitations of various compressor options have been well studied, their performance impacts on scientific simulations are less clear, especially on large scale supercomputer systems. This study fills in this gap by performing in situ compression experiments on a leading supercomputer system. More specifically, we measured the computational and I/O impacts of a lossy wavelet compressor and analyzed the results with respect to various in situ processing concerns. We believe this study provides a better understanding of in situ compression as well as new evidence supporting its viability, in particular for wavelets.

ieee symposium on large data analysis and visualization | 2016

Optimizing multi-image sort-last parallel rendering

Matthew Larsen; Kenneth Moreland; Christopher R. Johnson; Hank Childs

Sort-last parallel rendering can be improved by considering the rendering of multiple images at a time. Most parallel rendering algorithms consider the generation of only a single image. This makes sense when performing interactive rendering where the parameters of each rendering are not known until the previous rendering completes. However, in situ visualization often generates multiple images that do not need to be created sequentially. In this paper we present a simple and effective approach to improving parallel image generation throughput by amortizing the load and overhead among multiple image renders. Additionally, we validate our approach by conducting a performance study exploring the achievable speed-ups in a variety of image-based in situ use cases and rendering workloads. On average, our approach shows a 1.5 to 3.7 fold improvement in performance, and in some cases, shows a 10 fold improvement.

ieee symposium on large data analysis and visualization | 2015

Exploring tradeoffs between power and performance for a scientific visualization algorithm

Stephanie Labasan; Matthew Larsen; Hank Childs

Power is becoming a major design constraint in the world of high-performance computing (HPC). This constraint affects the hardware being considered for future architectures, the ways it will run software, and the design of the software itself. Within this context, we explore tradeoffs between power and performance. Visualization algorithms themselves merit special consideration, since they are more data-intensive in nature than traditional HPC programs like simulation codes. This data-intensive property enables different approaches for optimizing power usage. Our study focuses on the isosurfacing algorithm, and explores changes in power and performance as clock frequency changes, as power usage is highly dependent on clock frequency. We vary many of the factors seen in the HPC context - programming model (MPI vs. OpenMP), implementation (generalized vs. optimized), concurrency, architecture, and data set - and measure how these changes affect power-performance properties. The result is a study that informs the best approaches for optimizing energy usage for a representative visualization algorithm.

Explore More