Is this you? Create Your Porfile

Li-Ta Lo

Los Alamos National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Li-Ta Lo is active.

Explore More

Publication

Featured researches published by Li-Ta Lo.

eurographics workshop on parallel graphics and visualization | 2012

PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators

Li-Ta Lo; Christopher M. Sewell; James P. Ahrens

Due to the wide variety of current and next-generation supercomputing architectures, the development of highperformance parallel visualization and analysis operators frequently requires re-writing the underlying algorithms for many different platforms. In order to facilitate portability, we have devised a framework for creating such operators that employs the data-parallel programming model. By writing the operators using only data-parallel primitives (such as scans, transforms, stream compactions, etc.), the same code may be compiled to multiple targets using architecture-specific backend implementations of these primitives. Specifically, we make use of and extend NVIDIA’s Thrust library, which provides CUDA and OpenMP backends. Using this framework, we have implemented isosurface, cut surface, and threshold operators, and have achieved good parallel performance on two different architectures (multi-core CPUs and NVIDIA GPUs) using the exact same operator code. We have applied these operators to several large, real scientific data sets, and have open-source released a beta version of our code base.

IEEE Computer Graphics and Applications | 2016

VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures

Kenneth Moreland; Christopher M. Sewell; William Usher; Li-Ta Lo; Jeremy S. Meredith; David Pugmire; James Kress; Hendrik A. Schroots; Kwan-Liu Ma; Hank Childs; Matthew Larsen; Chun-Ming Chen; Robert Maynard; Berk Geveci

One of the most critical challenges for high-performance computing (HPC) scientific visualization is execution on massively threaded processors. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Our current production scientific visualization software is not designed for these new types of architectures. To address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.

ieee international conference on high performance computing data and analytics | 2015

Large-scale compute-intensive analysis via a combined in-situ and co-scheduling workflow approach

Christopher M. Sewell; Katrin Heitmann; Hal Finkel; George Zagaris; Suzanne T Parete-Koon; Patricia K. Fasel; Adrian Pope; Nicholas Frontiere; Li-Ta Lo; O. E. Bronson Messer; Salman Habib; James P. Ahrens

Large-scale simulations can produce hundreds of terabytes to petabytes of data, complicating and limiting the efficiency of workflows. Traditionally, outputs are stored on the file system and analyzed in post-processing. With the rapidly increasing size and complexity of simulations, this approach faces an uncertain future. Trending techniques consist of performing the analysis in-situ, utilizing the same resources as the simulation, and/or off-loading subsets of the data to a compute-intensive analysis system. We introduce an analysis framework developed for HACC, a cosmological N-body code, that uses both in-situ and co-scheduling approaches for handling petabyte-scale outputs. We compare different analysis set-ups ranging from purely off-line, to purely in-situ to in-situ/co-scheduling. The analysis routines are implemented using the PISTON/VTK-m framework, allowing a single implementation of an algorithm that simultaneously targets a variety of GPU, multi-core, and many-core architectures.

ieee symposium on large data analysis and visualization | 2014

Data-parallel halo finding with variable linking lengths

Wathsala Widanagamaachchi; Peer-Timo Bremer; Christopher M. Sewell; Li-Ta Lo; James P. Ahrens; Valerio Pascuccik

State-of-the-art cosmological simulations regularly contain billions of particles, providing scientists the opportunity to study the evolution of the Universe in great detail. However, the rate at which these simulations generate data severely taxes existing analysis techniques. Therefore, developing new scalable alternatives is essential for continued scientific progress. Here, we present a dataparallel, friends-of-friends halo finding algorithm that provides unprecedented flexibility in the analysis by extracting multiple linking lengths. Even for a single linking length, it is as fast as the existing techniques, and is portable to multi-threaded many-core systems as well as co-processing resources. Our system is implemented using PISTON and is coupled to an interactive analysis environment used to study halos at different linking lengths and track their evolution over time.

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV) | 2013

Portable data-parallel visualization and analysis in distributed memory environments

Christopher M. Sewell; Li-Ta Lo; James P. Ahrens

Data-parallelism is a programming model that maps well to architectures with a high degree of concurrency. Algorithms written using data-parallel primitives can be easily ported to any architecture for which an implementation of these primitives exists, making efficient use of the available parallelism on each. We have previously published results demonstrating our ability to compile the same data-parallel code for several visualization algorithms onto different on-node parallel architectures (GPUs and multi-core CPUs) using our extension of NVIDIAs Thrust library. In this paper, we discuss our extension of Thrust to support concurrency in distributed memory environments across multiple nodes. This enables the application developer to write data-parallel algorithms while viewing the data as single, long vectors, essentially without needing to explicitly take into consideration whether the values are actually distributed across nodes. Our distributed wrapper for Thrust handles the communication in the backend using MPI, while still using the standard Thrust library to take advantage of available on-node parallelism. We describe the details of our distributed implementations of several key data-parallel primitives, including scan, scatter/gather, sort, reduce, and upper/lower bound. We also present two higher-level distributed algorithms developed using these primitives: isosurface and KD-tree construction. Finally, we provide timing results demonstrating the ability of these algorithms to take advantage of available parallelism on nodes and across multiple nodes, and discuss scaling limitations for communication-intensive algorithms such as KD-tree construction.

ieee international conference on high performance computing data and analytics | 2012

The SDAV Software Frameworks for Visualization and Analysis on Next-Generation Multi-Core and Many-Core Architectures

Christopher M. Sewell; Jeremy S. Meredith; Kenneth Moreland; Tom Peterka; David E. DeMarle; Li-Ta Lo; James P. Ahrens; Robert Maynard; Berk Geveci

This paper surveys the four software frameworks being developed as part of the visualization pillar of the SDAV (Scalable Data Management, Analysis, and Visualization) Institute, one of the SciDAC (Scientific Discovery through Advanced Computing) Institutes established by the ASCR (Advanced Scientific Computing Research) Program of the U.S. Department of Energy. These frameworks include EAVL (Extreme-scale Analysis and Visualization Library), DAX (Data Analysis at Extreme), DIY (Do It Yourself), and PISTON. The objective of these frameworks is to facilitate the adaptation of visualization and analysis algorithms to take advantage of the available parallelism in emerging multi-core and many-core hardware architectures, in anticipation of the need for such algorithms to be run in-situ with LCF (leadership-class facilities) simulation codes on supercomputers.

2008 Workshop on Ultrascale Visualization | 2008

Petascale visualization: Approaches and initial results

James P. Ahrens; Li-Ta Lo; Boonthanome Nouanesengsy; John Patchett; Allen McPherson

With the advent of the first petascale supercomputer, Los Alamoss Roadrunner, there is a pressing need to address how to visualize petascale data. The crux of the petascale visualization performance problem is interactive rendering, since it is the most computationally intensive portion of the visualization process. For terascale platforms, commodity clusters with graphics processors (GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. In this work, we evaluated the rendering performance of multi-core CPU and GPU-based processors. To achieve high-performance on multi-core processors, we tested with multi-core optimized raytracing engines for rendering. For real-world performance testing, and to prepare for petascale visualization tasks, we interfaced these rendering engines with VTK and ParaView. Initial results show that rendering software optimized for multi-core CPU processors provides competitive performance to GPUs for the parallel rendering of massive data. The current architectural multi-core trend suggests multi-core based supercomputers are able to provide interactive visualization and rendering support now and in the future.

CGVC '16 Proceedings of the conferece on Computer Graphics & Visual Computing | 2016

Hybrid data-parallel contour tree computation

Hamish A. Carr; Christopher M. Sewell; Li-Ta Lo; James P. Ahrens

As data sets increase in size beyond the petabyte, it is increasingly important to have automated methods for data analysis and visualisation. While topological analysis tools such as the contour tree and Morse-Smale complex are now well established, there is still a shortage of efficient parallel algorithms for their computation, in particular for massively data-parallel computation on a SIMD model. We report the first data-parallel algorithm for computing the fully augmented contour tree, using a quantised computation model. We then extend this to provide a hybrid data-parallel / distributed algorithm allowing scaling beyond a single GPU or CPU, and provide results for its computation. Our implementation uses the portable data-parallel primitives provided by NVIDIAs Thrust library, allowing us to compile our same code for both GPUs and multi-core CPUs.

ieee symposium on large data analysis and visualization | 2015

Utilizing many-core accelerators for halo and center finding within a cosmology simulation

Christopher M. Sewell; Li-Ta Lo; Katrin Heitmann; Salman Habib; James P. Ahrens

Efficiently finding and computing statistics about “halos” (regions of high density) are essential analysis steps for N-body cosmology simulations. However, in state-of-the-art simulation codes, these analysis operators do not currently take advantage of the shared-memory data-parallelism available on multi-core and many-core architectures. The Hybrid / Hardware Accelerated Cosmology Code (HACC) is designed as an MPI+X code, but the analysis operators are parallelized only among MPI ranks, because of the difficulty in porting different X implementations (e.g., OpenMP, CUDA) across all architectures on which it is run. In this paper, we present portable data-parallel algorithms for several variations of halo finding and halo center finding algorithms. These are implemented with the PISTON component of the VTK-m framework, which uses Nvidias Thrust library to construct data-parallel algorithms that allow a single implementation to be compiled to multiple backends to target a variety of multi-core and many-core architectures. Finally, we compare the performance of our halo and center finding algorithms against the original HACC implementations on the Moonlight, Stampede, and Titan supercomputers. The portability of Thrust allowed the same code to run efficiently on each of these architectures. On Titan, the performance improvements using our code have enabled halo analysis to be performed on a very large data set (81923 particles across 16,384 nodes of Titan) for which analysis using only the existing CPU algorithms was not feasible.

EuroVis (Short Papers) | 2015

Visualization and Analysis of Large-Scale Atomistic Simulations of Plasma-Surface Interactions

Wathsala Widanagamaachchi; Karl D. Hammond; Li-Ta Lo; Brian D. Wirth; Francesca Samsel; Christopher M. Sewell; James P. Ahrens; Valerio Pascucci

We present a simulation–visualization pipeline that uses the LAMMPS Molecular Dynamics Simulator and the Visualization Toolkit to create a visualization and analysis environment for atomistic simulations of plasma–surface interactions. These simulations are used to understand the origin of fuzz-like, microscopic damage to tungsten and other metal surfaces by helium. The proposed pipeline serves both as an aid to visualization, i.e. drawing the surfaces of gas bubbles and voids/cavities in the metal, as well as a means of analysis, i.e. extracting various statistics and gas bubble evolution details. The result is a better understanding of the void and bubble formation process that is difficult if not impossible to get using conventional atomistic visualization software.

Explore More