Hartmut Kaiser | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hartmut Kaiser is active.

Explore More

Publication

Featured researches published by Hartmut Kaiser.

international conference on parallel processing | 2009

ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications

Hartmut Kaiser; Maciek Brodowicz; Thomas L. Sterling

High performance computing (HPC) is experiencing a phase change with the challenges of programming and management of heterogeneous multicore systems architectures and large scale system configurations. It is estimated that by the end of the next decade Exaflops computing systems requiring hundreds of millions of cores demanding multi-billion-way parallelism with a power budget of 50 Gflops/watt may emerge. At the same time, there are many scaling-challenged applications that although taking many weeks to complete, cannot scale even to a thousand cores using conventional distributed programming models. This paper describes an experimental methodology, ParalleX, that addresses these challenges through a change in the fundamental model of parallel computation from that of the communicating sequential processes (e.g., MPI) to an innovative synthesis of concepts involving message-driven work-queue execution in the context of a global address space. The focus of this work is a new runtime system required to test, validate, and evaluate the use of ParalleX concepts for extreme scalability. This paper describes the ParalleX model and the HPX runtime system and discusses how both strategies contribute to the goal of extreme computing through dynamic asynchronous execution. The paper presents the first early experimental results of tests using a proof-of-concept runtime-system implementation. These results are very promising and are guiding future work towards a full scale parallel programming and runtime environment.

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models | 2014

HPX: A Task Based Programming Model in a Global Address Space

Hartmut Kaiser; Thomas Heller; Bryce Adelstein-Lelbach; Adrian Serio; Dietmar Fey

The significant increase in complexity of Exascale platforms due to energy-constrained, billion-way parallelism, with major changes to processor and memory architecture, requires new energy-efficient and resilient programming techniques that are portable across multiple future generations of machines. We believe that guaranteeing adequate scalability, programmability, performance portability, resilience, and energy efficiency requires a fundamentally new approach, combined with a transition path for existing scientific applications, to fully explore the rewards of todays and tomorrows systems. We present HPX -- a parallel runtime system which extends the C++11/14 standard to facilitate distributed operations, enable fine-grained constraint based parallelism, and support runtime adaptive resource management. This provides a widely accepted API enabling programmability, composability and performance portability of user applications. By employing a global address space, we seamlessly augment the standard to apply to a distributed case. We present HPXs architecture, design decisions, and results selected from a diverse set of application runs showing superior performance, scalability, and efficiency over conventional practice.

cluster computing and the grid | 2009

Programming Abstractions for Data Intensive Computing on Clouds and Grids

Chris Miceli; Michael V. Miceli; Shantenu Jha; Hartmut Kaiser; Andre Merzky

MapReduce has emerged as an important data-parallel programming model for data-intensive computing – for Clouds and Grids. However most if not all implementations of MapReduce are coupled to a specific infrastructure. SAGA is a high-level programming interface which provides the ability to create distributed applications in an infrastructure independent way. In this paper, we show how MapReduce has been implemented using SAGA and demonstrate its interoperability across different distributed platforms – Grids, Cloud-like infrastructure and Clouds. We discuss the advantages of programmatically developing MapReduce using SAGA, by demonstrating that the SAGA-based implementation is infrastructure independent whilst still providing control over the deployment, distribution and runtime decomposition. The ability to control the distribution and placement of the computation units (workers) is critical in order to implement the ability to move computational work to the data. This is required to keep data network transfer low and in the case of commercial Clouds the monetary cost of computing the solution low. Using data-sets of size up to 10GB, and upto 10 workers, we provide detailed performance analysis of the SAGA-MapReduce implementation, and show how controllingthe distribution of computation and the payload per worker helps enhance performance.

measurement and modeling of computer systems | 2011

Preliminary design examination of the ParalleX system from a software and hardware perspective

Alexandre Tabbal; Matthew Anderson; Maciej Brodowicz; Hartmut Kaiser; Thomas L. Sterling

Exascale systems, expected to emerge by the end of the next decade, will require the exploitation of billion-way parallelism at multiple hierarchical levels in order to achieve the desired sustained performance. While traditional approaches to performance evaluation involve measurements of existing applications on the available platforms, such a methodology is obviously unsuitable for architectures still at the brainstorming stage. The prediction of the future machine performance is an important factor driving the design of both the execution hardware and software environment. A good way to start assessing the performance is to identify the factors challenging the scalability of parallel applications. We believe the root cause of these challenges is the incoherent coupling between the current enabling technologies, such as Non-Uniform Memory Access of present multicore nodes equipped with optional hardware accelerators and the decades older execution model, i.e., Communicating Sequential Processes (CSP). Supercomputing is in the midst of a much needed phase change and the High-Performance Computing community is slowly realizing the necessity for a new design dogma, as affirmed in the preliminary Exascale studies. In this paper, we present an overview of the ParalleX execution model and its complementary design efforts at the software and hardware levels, while including power draw of the system as the resource of utmost importance. Since the interplay of hardware and software environment is quickly becoming one of the dominant factors in the design of well integrated, energy efficient, large-scale systems, we also explore the implications of the ParalleX model on the organization of parallel computing architectures. We also present scaling and performance results for an adaptive mesh refinement application developed using a ParalleX-compliant runtime system implementation, HPX.

Future Generation Computer Systems | 2006

Distributed and collaborative visualization of large data sets using high-speed networks

Andrei Hutanu; Gabrielle Allen; Stephen David Beck; Petr Holub; Hartmut Kaiser; Archit Kulshrestha; Miloš Liška; Jon MacLaren; Ludek Matyska; Ravi Paruchuri; Steffen Prohaska; Edward Seidel; Brygg Ullmer; Shalini Venkataraman

We describe an architecture for distributed collaborative visualization that integrates video conferencing, distributed data management and grid technologies as well as tangible interaction devices for visualization. High-speed, low-latency optical networks support high-quality collaborative interaction and remote visualization of large data.

international conference on conceptual structures | 2012

Urgent Computing of Storm Surge for North Carolina's Coast

Brian Blanton; John McGee; Jason G. Fleming; Carola Kaiser; Hartmut Kaiser; Howard Lander; Richard A. Luettich; Kendra M. Dresback; Randy Kolar

Forecasting and prediction of natural events, such as tropical and extra-tropical cyclones, inland flooding, and severe winter weather, provide critical guidance to emergency managers and decision-makers from the local to the national level, with the goal of minimizing both human and economic losses. This guidance is used to facilitate evacuation route planning, post-disaster response and resource deployment, and critical infrastructure protection and securing, and it must be available within a time window in which decision makers can take appropriate action. This latter element is that which induces the need for urgency in this area. In this paper, we outline the North Carolina Forecasting System (NCFS) for storm surge and waves for coastal North Carolina, which is threatened by tropical cyclones about once every three years. We initially used advanced cyberinfrastructure techniques (e.g., opportunistic grid computing) in an effort to provide timely guidance for storm surge and wave impacts. However, our experience has been that a distributed computing approach is not robust enough to consistently produce the real-time results that end users expect. As a result, our technical approach has shifted so that the reliable and timely delivery of forecast products has been guaranteed by provisioning dedicated computational resources as opposed to relying on opportunistic availability of external resources. Our experiences with this forecasting effort is discussed in this paper, with a focus on Hurricane Irene (2011) that impacted a substantial portion of the US east coast from North Carolina, up along the eastern seaboard, and into New England.

ieee international conference on high performance computing data and analytics | 2012

Improving the scalability of parallel N-body applications with an event-driven constraint-based execution model

Chirag Dekate; Matthew Anderson; Maciej Brodowicz; Hartmut Kaiser; Bryce Adelstein-Lelbach; Thomas L. Sterling

The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends such as multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the parallel execution of graphs that are generated using the Barnes–Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery by using the advanced semantics for exascale computing.

international conference on e science | 2007

Design and Implementation of Network Performance Aware Applications Using SAGA and Cactus

Shantenu Jha; Hartmut Kaiser; Y. El Khamra; Ole Weidner

This paper demonstrates the use of appropriate programming abstractions - SAGA and cactus - that facilitate the development of applications for distributed infrastructure. SAGA provides a high-level programming interface to Grid- functionality; Cactus is an extensible, component based framework for scientific applications. We show how SAGA can be integrated with cactus to develop simple, useful and easily extensible applications that can be deployed on a wide variety of distributed infrastructure, independent of the details of the resources. Our model application can gather and analyze network performance data and migrate across heterogeneous resources. We outline the architecture of our application and discuss how it imparts important features required of eScience applications. As a proof-of-concept, we present details of the successful deployment of our application over distinct and heterogeneous Grids and present the network performance data gathered. We also discuss several interesting use cases for such an application - which can be used either as stand-alone network diagnostic agent, or in conjunction with more complex scientific applications.

international conference on e science | 2007

Grid Interoperability at the Application Level Using SAGA

Shantenu Jha; Hartmut Kaiser; Andre Merzky; Ole Weidner

SAGA is a high-level programming abstraction, which significantly facilitates the development and deployment of Grid-aware applications. The primary aim of this paper is to discuss how each of the three main components of the SAGA landscape - interface specification, specific implementation and the different adaptors for middleware distribution - facilitate application-level interoperability. We discuss SAGA in relation to the ongoing GIN Community Group efforts and show the consistency of the SAGA approach with the GIN Group efforts. We demonstrate how interoperability can be enabled by the use of SAGA, by discussing two simple, yet meaningful applications: in the first, SAGA enables applications to utilize interoperability and in the second example SAGA adaptors provide the basis for interoperability.

Computer Science - Research and Development | 2013

Application of the ParalleX execution model to stencil-based problems

Thomas Heller; Hartmut Kaiser; Klaus Iglberger

In the prospect of the upcoming exa-scale era with millions of execution units, the question of how to deal with this level of parallelism efficiently is of time-critical relevance. State-of-the-Art parallelization techniques such as OpenMP and MPI are not guaranteed to solve the expected problems of starvation, growing latencies, overheads, and contention. On the other hand, new parallelization paradigms promise to efficiently hide latencies and contain starvation and contention.In this paper we analyze the performance of one novel parallelization strategy for shared and distributed memory machines. We will focus on shared memory architectures and compare the performance of the ParalleX execution model against the quasi-standard OpenMP for a standard stencil-based problem. We compare in detail the OpenMP implementation of two applications of Jacobi solvers (one based on regular grid and one based on an irregular grid structure) with the corresponding implementation of these applications using HPX (High Performance ParalleX), the first feature-complete, open-source implementation of ParalleX, and analyze the results of both implementations on a multi-socket NUMA node.

Explore More