Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Laksono Adhianto is active.

Publication


Featured researches published by Laksono Adhianto.


Concurrency and Computation: Practice and Experience | 2009

HPCTOOLKIT: tools for performance analysis of optimized parallel programs

Laksono Adhianto; S. Banerjee; Mike Fagan; Mark W. Krentel; Gabriel Marin; John M. Mellor-Crummey; Nathan R. Tallent

HPCTOOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation of application performance for both sequential and parallel programs. HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully optimized parallel programs with a measurement overhead of only a few percent. Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully optimized codes without any compiler support, pinpointing and quantifying bottlenecks in multithreaded programs, exploring performance information and source code using a new user interface, and displaying hierarchical space–time diagrams based on traces of asynchronous call path samples. This paper provides an overview of HPCTOOLKIT and illustrates its utility for performance analysis of parallel applications. Copyright


Proceedings of the Third Conference on Partitioned Global Address Space Programing Models | 2009

A new vision for coarray Fortran

John M. Mellor-Crummey; Laksono Adhianto; William N. Scherer; Guohua Jin

In 1998, Numrich and Reid proposed Coarray Fortran as a simple set of extensions to Fortran 95 [7]. Their principal extension to Fortran was support for shared data known as coarrays. In 2005, the Fortran Standards Committee began exploring the addition of coarrays to Fortran 2008, which is now being finalized. Careful review of drafts of the emerging Fortran 2008 standard led us to identify several shortcomings with the proposed coarray extensions. In this paper, we briefly critique the coarray extensions proposed for Fortran 2008, outline a new vision for coarrays in Fortran language that is far more expressive, and briefly describe our strategy for implementing the language extensions that we propose.


ieee international conference on high performance computing data and analytics | 2010

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

Nathan R. Tallent; Laksono Adhianto; John M. Mellor-Crummey

Applications must scale well to make efficient use of todays class of petascale computers, which contain hundreds of thousands of processor cores. Inefficiencies that do not even appear in modest-scale executions can become major bottlenecks in large-scale executions. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of scaling problems. Load imbalance is one of the most common scaling problems. To provide actionable insight into load imbalance, we present post-mortem parallel analysis techniques for pinpointing and quantifying load imbalance in the context of call path profiles of parallel programs. We show how to identify load imbalance in its static and dynamic context by using only low-overhead asynchronous call path profiling to locate regions of code responsible for communication wait time in SPMD executions. We describe the implementation of these techniques within HPCTOOLKIT.


Simulation Modelling Practice and Theory | 2007

Performance modeling of communication and computation in hybrid MPI and OpenMP applications

Laksono Adhianto; Barbara M. Chapman

Performance evaluation and modeling is a crucial process to enable the optimization of parallel programs. Programs written using two programming models, such as MPI and OpenMP, require an analysis to determine both performance efficiency and the most suitable numbers of processes and threads for their execution on a given platform. To study both of these problems, we propose the construction of a model that is based upon a small number of parameters, but is able to capture the complexity of the runtime system. We must incorporate measurements of overheads introduced by each of the programming models, and thus need to model both the network and computational aspects of the system. We have combined two different techniques: static analysis, driven by the OpenUH compiler, to retrieve application signatures and a parallelization overhead measurement benchmark, realized by Sphinx and Perfsuite, to collect system profiles. Finally, we propose a performance evaluation measurement to identify communication and computation efficiency. In this paper we describe our underlying framework, the performance model, and show how our tool can be applied to a sample code


Journal of Physics: Conference Series | 2008

HPCToolkit: performance tools for scientific computing

Nathan R. Tallent; John M. Mellor-Crummey; Laksono Adhianto; Mike Fagan; Mark W. Krentel

As part of the U.S. Department of Energys Scientific Discovery through Advanced Computing (SciDAC) program, science teams are tackling problems that require simulation and modeling on petascale computers. As part of activities associated with the SciDAC Center for Scalable Application Development Software (CScADS) and the Performance Engineering Research Institute (PERI), Rice University is building software tools for performance analysis of scientific applications on the leadership-class platforms. In this poster abstract, we briefly describe the HPCToolkit performance tools and how they can be used to pinpoint bottlenecks in SPMD and multi-threaded parallel codes. We demonstrate HPCToolkits utility by applying it to two SciDAC applications: the S3D code for simulation of turbulent combustion and the MFDn code for ab initio calculations of microscopic structure of nuclei.


international conference on supercomputing | 2011

Scalable fine-grained call path tracing

Nathan R. Tallent; John M. Mellor-Crummey; Michael Franco; Reed Landrum; Laksono Adhianto

Applications must scale well to make efficient use of even medium-scale parallel systems. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of performance bottlenecks. Although tracing is a powerful performance-analysis technique, tools that employ it can quickly become bottlenecks themselves. Moreover, to obtain actionable performance feedback for modular parallel software systems, it is often necessary to collect and present fine-grained context-sensitive data --- the very thing scalable tools avoid. While existing tracing tools can collect calling contexts, they do so only in a coarse-grained fashion; and no prior tool scalably presents both context- and time-sensitive data. This paper describes how to collect, analyze and present fine-grained call path traces for parallel programs. To scale our measurements, we use asynchronous sampling, whose granularity is controlled by a sampling frequency, and a compact representation. To present traces at multiple levels of abstraction and at arbitrary resolutions, we use sampling to render complementary slices of calling-context-sensitive trace data. Because our techniques are general, they can be used on applications that use different parallel programming models (MPI, OpenMP, PGAS). This work is implemented in HPCToolkit.


ieee international conference on high performance computing data and analytics | 2009

Diagnosing performance bottlenecks in emerging petascale applications

Nathan R. Tallent; John M. Mellor-Crummey; Laksono Adhianto; Mike Fagan; Mark W. Krentel

Cutting-edge science and engineering applications require petascale computing. It is, however, a significant challenge to use petascale computing platforms effectively. Consequently, there is a critical need for performance tools that enable scientists to understand impediments to performance on emerging petascale systems. In this paper, we describe HPCToolkit-a suite of multi-platform tools that supports sampling-based analysis of application performance on emerging petascale platforms. HPCToolkit uses sampling to pinpoint and quantify both scaling and node performance bottlenecks. We study several emerging petascale applications on the Cray XT and IBM BlueGene/P platforms and use HPCToolkit to identify specific source lines - in their full calling context - associated with performance bottlenecks in these codes. Such information is exactly what application developers need to know to improve their applications to take full advantage of the power of petascale systems.


international parallel and distributed processing symposium | 2011

Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0

Guohua Jin; John M. Mellor-Crummey; Laksono Adhianto; William N. Scherer; Chaoran Yang

Todays largest supercomputers have over two hundred thousand CPU cores and even larger systems are under development. Typically, these systems are programmed using message passing. Over the past decade, there has been considerable interest in developing simpler and more expressive programming models for them. Partitioned global address space (PGAS) languages are viewed as perhaps the most promising alternative. In this paper, we report on our experience developing a set of PGAS extensions to Fortran that we call Co array Fortran 2.0 (CAF 2.0). Our design for CAF 2.0 goes well beyond the original 1998 design of Co array Fortran (CAF) by Numrich and Reid. CAF 2.0 includes language support for many features including teams, collective communication, asynchronous communication, function shipping, and synchronization. We describe the implementation of these features and our experiences using them to implement the High Performance Computing Challenge (HPCC) benchmarks, including High Performance Linpack (HPL), Random Access, Fast Fourier Transform (FFT), and STREAM triad. On 4096 CPU cores of a Cray XT with 2.3 GHz single socket quad-core Opteron processors, we achieved 18.3 TFLOP/s with HPL, 2.01 GUP/s with Random Access, 125 GFLOP/s with FFT, and a bandwidth of 8.73 TByte/s with STREAM triad. we call Co array Fortran 2.0 (CAF 2.0). Our design for CAF 2.0 goes well beyond the original 1998 design of Coarray Fortran (CAF) by Numrich and Reid. CAF 2.0 includes language support for many features including teams, collective communication, asynchronous communication, function shipping, and synchronization. We describe the implementation of these features and our experiences using them to implement the High Performance Computing Challenge (HPCC) benchmarks, including High Performance Linpack (HPL), Random Access, Fast Fourier Transform (FFT), and STREAM triad. On 4096 CPU cores of a Cray XT with 2.3 GHz single socket quad-core Opteron processors, we achieved 18.3 TFLOP/s with HPL, 2.01 GUP/s with Random Access, 125 GFLOP/s with FFT, and a bandwidth of 8.73 TByte/s with STREAM triad.


parallel and distributed computing: applications and technologies | 2003

Dragon: an Open64-based interactive program analysis tool for large applications

Barbara M. Chapman; Oscar R. Hernandez; Lei Huang; Tien-Hsiung Weng; Zhenying Liu; Laksono Adhianto; Yi Wen

A program analysis tool can play an important role in helping users understand and improve large application codes. Dragon is a robust interactive program analysis tool based on the Open64 compiler, which is an Open source C/C++/Fortran77/90 compiler for Intel Itanium systems. We designed and developed the Dragon analysis tool to support manual optimization and parallelization of large applications by exploiting the powerful analyses of the Open64 compiler. Dragon enables users to visualize and print the essential program structure of and obtains information on their large applications. Current features include the call graph, flow graph, and data dependences. Ongoing work extends both Open64 and Dragon by a new call graph construction algorithm and its related interprocedural analysis, global variable definition and usage analysis, and an external interface that can be used by other tools such as profilers and debuggers to share program analysis information. Future work includes supporting the creation and optimization of shared memory parallel programs written using OpenMP.


Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model | 2010

Hiding latency in Coarray Fortran 2.0

William N. Scherer; Laksono Adhianto; Guohua Jin; John M. Mellor-Crummey; Chaoran Yang

In Numrich and Reids 1998 proposal [17], Coarray Fortran is a simple set of extensions to Fortran 95, principal among which is support for shared data known as coarrays. Responding to short-comings in the Fortran Standards Committees addition of coarrays to the Fortran 2008 standards, we at Rice envisioned an extensive update which has come to be known as Coarray Fortran 2.0 [15]. In this paper, we chronicle the evolution of Coarray Fortran 2.0 as it gains support for asynchronous point-to-point and collective operations. We outline how these operations are implemented and describe code fragments from several benchmark programs to show we use these operations to hide latency by overlapping communication and computation.

Collaboration


Dive into the Laksono Adhianto's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nathan R. Tallent

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gabriel Marin

Oak Ridge National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge