Is this you? Create Your Porfile

Dan Quinlan

Lawrence Livermore National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dan Quinlan is active.

Explore More

Publication

Featured researches published by Dan Quinlan.

IEEE Computer | 2011

Rethinking Hardware-Software Codesign for Exascale Systems

John Shalf; Dan Quinlan; Curtis L. Janssen

The rapid and disruptive changes anticipated in hardware design over this next decade necessitate a more agile development process, such as the hardware-software co-design processes developed for rapid product development in the embedded space. This article will describe the structure of the co-design process as applied to supercomputing systems, introduce the role of architectural simulation and code analysis to enable co-design, and describe the CoDEx project that is developing tools to accelerate the iterative co-design cycle for the DOE exascale computing program.

ieee international conference on high performance computing data and analytics | 2012

Bamboo: translating MPI applications to a latency-tolerant, data-driven form

Tan Nguyen; Pietro Cicotti; Eric J. Bylaska; Dan Quinlan; Scott B. Baden

We present Bamboo, a custom source-to-source translator that transforms MPI C source into a data-driven form that automatically overlaps communication with available computation. Running on up to 98304 processors of NERSCs Hopper system, we observe that Bamboos overlap capability speeds up MPI implementations of a 3D Jacobi iterative solver and Cannons matrix multiplication. Bamboos generated code meets or exceeds the performance of hand optimized MPI, which includes split-phase coding, the method classically employed to hide communication. We achieved our results with only modest amounts of programmer annotation and no intrusive reprogramming of the original application source.

ieee international conference on high performance computing data and analytics | 2007

Tool Support for Inspecting the Code Quality of HPC Applications

Thomas Panas; Dan Quinlan; Richard W. Vuduc

The nature of HPC application development encourages ad hoc design and implementation, rather than formal requirements analysis and design specification as is typical in software engineering. However, we cannot simply expect HPC developers to adopt formal software engineering processes wholesale, even while there is a need to improve software structure and quality to ensure future maintainability. Therefore, we propose tools that HPC developers can use at their discretion to obtain feedback on the structure and quality of their codes. This feedback would come in the form of code quality metrics and analyses, presented when necessary in intuitive and interactive visualizations. This paper summarizes our implementation of just such a tool, which we apply to a standard HPC benchmark as proof-of-concept.

acm sigplan symposium on principles and practice of parallel programming | 2010

A symbolic verifier for CUDA programs

Guodong Li; Ganesh Gopalakrishnan; Robert M. Kirby; Dan Quinlan

We present a preliminary automated verifier based on mechanical decision procedures which is able to prove functional correctness of CUDA programs and guarantee to detect bugs such as race conditions. We also employ a symbolic partial order reduction (POR) technique to mitigate the interleaving explosion problem.

IMA workshop on structured adaptive mesh refinement grid methods, Minneapolis, MN (United States), 12-13 Mar 1997 | 2000

AMR++: A design for parallel object-oriented adaptive mesh refinement

Dan Quinlan

Adaptive mesh refinement computations are complicated by their dynamic nature. In the serial environment they require substantial infrastructures to support the regridding processes, intergrid operations, and local bookkeeping of positions of grids relative to one another. In the parallel environment the dynamic behavior is more problematic because it requires dynamic distribution support and load balancing. Parallel AMR is further complicated by the substantial task parallelism, in addition to the obvious data parallelism, this task parallelism requires additional infrastructure to support efficiently. The degree of parallelism is typically dependent upon the algorithms in use and the equations being solved. Different algorithms have significant compromises between computation and communication. Substantial research work is often required to define efficient methods and suitable infrastructure. The purpose of this paper is to introduce AMR++ as an object-oriented library which forms a part of the OVERTURE framework, a much larger object-oriented numerical framework developed and supported at Los Alamos National Laboratory and distributed on the Web for the last several years.

Journal of Parallel and Distributed Computing | 2017

Automatic translation of MPI source into a latency-tolerant, data-driven form

Tan Nguyen; Pietro Cicotti; Eric J. Bylaska; Dan Quinlan; Scott B. Baden

Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboos performance meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library. Bamboo is a translator that can reformulate MPI source into a task graph form.Bamboo supports both point-to-point and collective communication.Bamboo supports GPUs, hiding communication among GPUs and between hosts and GPUs.Bamboo speeds up applications containing elaborate data and control structures.

Archive | 2011

The ROSE Source-to-Source Compiler Infrastructure

Dan Quinlan; Chunhua Liao

Presented at: PGAS 2011, Galveston, TX, United States, Oct 15 - Oct 18, 2010 | 2011

Runtime Detection of C-Style Errors in UPC Code

Peter Pirkelbauer; Chunhua Liao; Thomas Panas; Dan Quinlan

Archive | 2007

Shared and Distributed Memory Parallel Security Analysis of Large-Scale Source Code and Binary Applications

Dan Quinlan; Gergö Barany; Thomas Panas

Journal Name: CT Watch Quarterely; Journal Volume: 3; Journal Issue: 4; Related Information: Journal Publication Date: Nov 2007 | 2007

Performance Engineering: Understanding and Improving thePerformance of Large-Scale Codes

David H. Bailey; Robert F. Lucas; Paul D. Hovland; Boyana Norris; Katherine A. Yelick; Dan Gunter; Bronis R. de Supinski; Dan Quinlan; Pat Worley; Jeff Vetter; Phil Roth; John M. Mellor-Crummey; Allan Snavely; Jeffrey K. Hollingsworth; Daniel A. Reed; Rob Fowler; Ying Zhang; Mary W. Hall; Jacque Chame; Jack Dongarra; Shirley Moore

Explore More