Hasan Abbasi
Oak Ridge National Laboratory
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Hasan Abbasi.
Concurrency and Computation: Practice and Experience | 2014
Qing Liu; Jeremy Logan; Yuan Tian; Hasan Abbasi; Norbert Podhorszki; Jong Youl Choi; Scott Klasky; Roselyne Tchoua; Jay F. Lofstead; Ron A. Oldfield; Manish Parashar; Nagiza F. Samatova; Karsten Schwan; Arie Shoshani; Matthew Wolf; Kesheng Wu; Weikuan Yu
Applications running on leadership platforms are more and more bottlenecked by storage input/output (I/O). In an effort to combat the increasing disparity between I/O throughput and compute capability, we created Adaptable IO System (ADIOS) in 2005. Focusing on putting users first with a service oriented architecture, we combined cutting edge research into new I/O techniques with a design effort to create near optimal I/O methods. As a result, ADIOS provides the highest level of synchronous I/O performance for a number of mission critical applications at various Department of Energy Leadership Computing Facilities. Meanwhile ADIOS is leading the push for next generation techniques including staging and data processing pipelines. In this paper, we describe the startling observations we have made in the last half decade of I/O research and development, and elaborate the lessons we have learned along this journey. We also detail some of the challenges that remain as we look toward the coming Exascale era. Copyright
international parallel and distributed processing symposium | 2012
Fan Zhang; Ciprian Docan; Manish Parashar; Scott Klasky; Norbert Podhorszki; Hasan Abbasi
Emerging scientific application workflows are composed of heterogeneous coupled component applications that simulate different aspects of the physical phenomena being modeled, and that interact and exchange significant volumes of data at runtime. With the increasing performance gap between on-chip data sharing and off-chip data transfers in current systems based on multicore processors, moving large volumes of data using communication network fabric can significantly impact performance. As a result, minimizing the amount of inter-application data exchanges that are across compute nodes and use the network is critical to achieving overall application performance and system efficiency. In this paper, we investigate the in-situ execution of the coupled components of a scientific application workflow so as to maximize on-chip exchange of data. Specifically, we present a distributed data sharing and task execution framework that (1) employs data-centric task placement to map computations from the coupled applications onto processor cores so that a large portion of the data exchanges can be performed using the intra-node shared memory, (2) provides a shared space programming abstraction that supplements existing parallel programming models (e.g., message passing) with specialized one-sided asynchronous data access operators and can be used to express coordination and data exchanges between the coupled components. We also present the implementation of the framework and its experimental evaluation on the Jaguar Cray XT5 at Oak Ridge National Laboratory.
international parallel and distributed processing symposium | 2013
Fang Zheng; Hongbo Zou; Greg Eisenhauer; Karsten Schwan; Matthew Wolf; Jai Dayal; Tuan-Anh Nguyen; Jianting Cao; Hasan Abbasi; Scott Klasky; Norbert Podhorszki; Hongfeng Yu
Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data online while simulations are running and before storing data on disk. There are several options to place data analytics along the I/O path: on compute nodes, on separate nodes dedicated to analytics, or after data is stored on persistent storage. Since different placements have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. The FlexIO middleware described in this paper makes it easy for scientists to obtain such flexibility, by offering simple abstractions and diverse data movement methods to couple simulation with analytics. Various placement policies can be built on top of FlexIO to exploit the trade-offs in performing analytics at different levels of the I/O hierarchy. Experimental results demonstrate that FlexIO can support a variety of simulation and analytics workloads at large scale through flexible placement options, efficient data movement, and dynamic deployment of data manipulation functionalities.
ieee international conference on high performance computing data and analytics | 2013
Fang Zheng; Hongfeng Yu; Can Hantaş; Matthew Wolf; Greg Eisenhauer; Karsten Schwan; Hasan Abbasi; Scott Klasky
Severe I/O bottlenecks on High End Computing platforms call for running data analytics in situ. Demonstrating that there exist considerable resources in compute nodes un-used by typical high end scientific simulations, we leverage this fact by creating an agile runtime, termed GoldRush, that can harvest those otherwise wasted, idle resources to efficiently run in situ data analytics. GoldRush uses fine-grained scheduling to “steal” idle resources, in ways that minimize interference between the simulation and in situ analytics. This involves recognizing the potential causes of on-node resource contention and then using scheduling methods that prevent them. Experiments with representative science applications at large scales show that resources harvested on compute nodes can be leveraged to perform useful analytics, significantly improving resource efficiency, reducing data movement costs incurred by alternate solutions, and posing negligible impact on scientific simulations.
cluster computing and the grid | 2014
Jai Dayal; Drew Bratcher; Greg Eisenhauer; Karsten Schwan; Matthew Wolf; Xuechen Zhang; Hasan Abbasi; Scott Klasky; Norbert Podhorszki
As high-end systems move toward exascale sizes, a new model of scientific inquiry being developed is one in which online data analytics run concurrently with the high end simulations producing data outputs. Goals are to gain rapid insights into the ongoing scientific processes, assess their scientific validity, and/or initiate corrective or supplementary actions by launching additional computations when needed. The Flexpath system presented in this paper addresses the fundamental problem of how to structure and efficiently implement the communications between high end simulations and concurrently running online data analytics, the latter comprised of componentized dynamic services and service pipelines. Using a type-based publish/subscribe approach, Flexpath encourages diversity by permitting analytics services to differ in their computational and scaling characteristics and even in their internal execution models. Flex path uses direct and MxN connections between interacting services to reduce data movements, to allow for runtime connectivity changes to accommodate component arrivals/departures, and to support the multiple underlying communication protocols used for analytics workflows in which simulation outputs are processed by analytics services residing on the same nodes where they are generated, on the same machine, and/or on attached or remote analytics engines. This paper describes the design and implementation of Flexpath, and evaluates it with two widely used scientific applications and their associated data analytics methods.
ieee vgtc conference on visualization | 2016
Andrew C. Bauer; Hasan Abbasi; James P. Ahrens; Hank Childs; Berk Geveci; Scott Klasky; Kenneth Moreland; Patrick O'Leary; Venkatram Vishwanath; Brad Whitlock; E.W. Bethel
The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization data without first writing to disk, i. e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed/visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, where fine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Third is the ability to use all available resources, CPUs and accelerators, in the computation of analysis products. This STAR paper brings together researchers, developers and practitioners using in situ methods in extreme‐scale HPC with the goal to present existing methods, infrastructures, and a range of computational science and engineering applications using in situ analysis and visualization.
ieee international conference on high performance computing data and analytics | 2013
Tong Jin; Fan Zhang; Qian Sun; Hoang Bui; Manish Parashar; Hongfeng Yu; Scott Klasky; Norbert Podhorszki; Hasan Abbasi
As system scales and application complexity grow, managing and processing simulation data has become a significant challenge. While recent approaches based on data staging and in-situ/in-transit data processing are promising, dynamic data volumes and distributions,such as those occurring in AMR-based simulations, make the efficient use of these techniques challenging. In this paper we propose cross-layer adaptations that address these challenges and respond at runtime to dynamic data management requirements. Specifically we explore (1) adaptations of the spatial resolution at which the data is processed, (2) dynamic placement and scheduling of data processing kernels, and (3) dynamic allocation of in-transit resources. We also exploit co-ordinated approaches that dynamically combine these adaptations at the different layers. We evaluate the performance of our adaptive cross-layer management approach on the Intrepid IBM-BlueGene/P and Titan Cray-XK7 systems using Chombo-based AMR applications, and demonstrate its effectiveness in improving overall time-to-solution and increasing resource efficiency.
international conference on parallel processing | 2012
Jeremy Logan; Scott Klasky; Hasan Abbasi; Qing Liu; George Ostrouchov; Manish Parashar; Norbert Podhorszki; Yuan Tian; Matthew Wolf
We address the difficulty involved in obtaining meaningful measurements of I/O performance in HPC applications, as well as the further challenge of understanding the causes of I/O bottlenecks in these applications. The need for I/O optimization is critical given the difficulty in scaling I/O to ever increasing numbers of processing cores. To address this need, we have pioneered a new approach to the analysis of I/O performance using automatic generation of I/O benchmark codes given a high-level description of an applications I/O pattern. By combining this with low-level characterization of the performance of the various components of the underlying I/O method we are able to produce a complete picture of the I/O behavior of an application. n nWe compare the performance measurements obtained using Skel, the tool that implements our approach, with those of an instrumented version of the original application to show that our approach is accurate. We demonstrate the use of Skel to compare the performance of several I/O methods. Finally we show that the detailed breakdown of timing information produced by Skel provides better understanding of the reasons for the performance differences between the examined I/O methods. We conclude that our approach facilitates faster, more accurate and more meaningful I/O performance testing, allowing application I/O performance to be predicted, and new systems and I/O methods to be evaluated.
international conference on e-science | 2011
Greg Eisenhauer; Matthew Wolf; Hasan Abbasi; Scott Klasky; Karsten Schwan
The manner in which data is represented, accessed and transmitted has an affect upon the efficiency of any computing system. In the domain of high performance computing, traditional frameworks like MPI have relied upon a relatively static type system with a high degree of a priori knowledge shared among the participants. However, modern scientific computing is increasingly distributed and dynamic, requiring the ability to dynamically create multi-platform workflows, to move processing to data, and to perform both in situ and streaming data analysis. Traditional approaches to data type description and communication in middleware, which typically either require a priori agreement on data types, or resort to highly inefficient representations like XML, are insufficient for the new domain of dynamic science. This paper describes a different approach, using FFS, a middleware library that implements efficient manipulation of application-level data. FFS provides for highly efficient binary data communication, XML-like examination of unknown data, and both third-party and in situ data processing via dynamic code generation. All of these capabilities are fully dynamic at run-time, without requiring a priori agreements or knowledge of the exact form of the data being communicated or analyzed.
international conference on e-science | 2011
Jeremy Logan; Scott Klasky; Jay F. Lofstead; Hasan Abbasi; Stephane Ethier; Ray W. Grout; S. Ku; Qing Liu; Xiaosong Ma; Manish Parashar; Norbert Podhorszki; Karsten Schwan; Matthew Wolf
Massively parallel computations consist of a mixture of computation, communication, and I/O. Of these three components, implementing an effective parallel I/O solution has often been overlooked by application scientists and has typically been added to large scale simulations only when existing serial techniques have failed. As scientists teams scaled their codes to run on hundreds of processors, it was common to call on an I/O expert to implement a set of more scalable I/O routines. These routines were easily separated from the calculations and communication, and in many cases, an I/O kernel was derived from the application which could be used for testing I/O performance independent of the application. These I/O kernels developed a life of their own used as a broad measure for comparing different I/O techniques. Unfortunately, as years passed and computation and communication changes required changes to the I/O, the separate I/O kernel used for benchmarking remained static, no longer providing an accurate indicator of the I/O performance of the simulation, and making I/O research less relevant for the application scientists. In this paper we describe a new approach to this problem where I/O kernels are replaced with skeletal I/O applications that are automatically generated from an abstract set of simulation I/O parameters. We realize this abstraction by leveraging the ADIOS [1] middlewares XML I/O specification with additional runtime parameters. Skeletal applications offer all of the benefits of I/O kernels including allowing I/O optimizations to focus on useful I/O patterns. Moreover, since they are automatically generated, it is easy to produce an updated I/O skeleton whenever the simulations I/O changes. In this paper we analyze the performance of automatically generated I/O skeletal applications for the S3D and GTS codes. We show that these skeletal applications achieve performance comparable to that of the production applications. We wrap up the paper with a discussion of future changes to make the skeletal application better approximate the actual I/O performed in the simulation.
