Karl W. Schulz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karl W. Schulz is active.

Explore More

Publication

Featured researches published by Karl W. Schulz.

international conference on parallel processing | 2011

The parallel c++ statistical library 'QUESO': quantification of uncertainty for estimation, simulation and optimization

Ernesto E. Prudencio; Karl W. Schulz

QUESO is a collection of statistical algorithms and programming constructs supporting research into the uncertainty quantification (UQ) of models and their predictions. It has been designed with three objectives: it should (a) be sufficiently abstract in order to handle a large spectrum of models, (b) be algorithmically extensible, allowing an easy insertion of new and improved algorithms, and (c) take advantage of parallel computing, in order to handle realistic models. Such objectives demand a combination of an object-oriented design with robust software engineering practices. QUESO is written in C++, uses MPI, and leverages libraries already available to the scientific community. We describe some UQ concepts, present QUESO, and list planned enhancements.

ieee international conference on high performance computing data and analytics | 2012

Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes

Hari Subramoni; Sreeram Potluri; Krishna Chaitanya Kandalla; Bill Barth; Jérôme Vienne; Jeff Keasler; Karen Tomko; Karl W. Schulz; Adam Moody; Dhabaleswar K. Panda

Over the last decade, InfiniBand has become an increasingly popular interconnect for deploying modern supercomputing systems. However, there exists no detection service that can discover the underlying network topology in a scalable manner and expose this information to runtime libraries and users of the high performance computing systems in a convenient way. In this paper, we design a novel and scalable method to detect the InfiniBand network topology by using Neighbor-Joining techniques (NJ). To the best of our knowledge, this is the first instance where the neighbor joining algorithm has been applied to solve the problem of detecting InfiniBand network topology. We also design a network-topology-aware MPI library that takes advantage of the network topology service. The library places processes taking part in the MPI job in a network-topology-aware manner with the dual aim of increasing intra-node communication and reducing the long distance inter-node communication across the InfiniBand fabric.

international conference on supercomputing | 2010

Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application

Sreeram Potluri; Ping Lai; Karen Tomko; Sayantan Sur; Yifeng Cui; Mahidhar Tatineni; Karl W. Schulz; William L. Barth; Amitava Majumdar; Dhabaleswar K. Panda

AWM-Olsen is a widely used ground motion simulation code based on a parallel finite difference solution of the 3-D velocity-stress wave equation. This application runs on tens of thousands of cores and consumes several million CPU hours on the TeraGrid Clusters every year. A significant portion of its run-time (37% in a 4,096 process run), is spent in MPI communication routines. Hence, it demands an optimized communication design coupled with a low-latency, high-bandwidth network and an efficient communication subsystem for good performance. In this paper, we analyze the performance bottlenecks of the application with regard to the time spent in MPI communication calls. We find that much of this time can be overlapped with computation using MPI non-blocking calls. We use both two-sided and MPI-2 one-sided communication semantics to re-design the communication in AWM-Olsen. We find that with our new design, using MPI-2 one-sided communication semantics, the entire application can be sped up by 12% at 4K processes and by 10% at 8K processes on a state-of-the-art InfiniBand cluster, Ranger at the Texas Advanced Computing Center (TACC).

international conference on cluster computing | 2011

Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters

Hari Subramoni; Krishna Chaitanya Kandalla; Jérôme Vienne; Sayantan Sur; Bill Barth; Karen Tomko; Robert T. McLay; Karl W. Schulz; Dhabaleswar K. Panda

It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexities of making the application performance network topology agnostic is hidden from the end user. Similarly, the rapid improvements in networking technology and speed are resulting in many commodity clusters becoming heterogeneous, with respect to networking speed. For example, switches and adapters belonging to different generations (SDR - 8 Gbps, DDR - 16 Gbps and QDR - 36 Gbps speeds in InfiniBand) are integrated into a single system. This leads to an additional challenge to make the communication library aware of the performance implications of heterogeneous link speeds. Accordingly, the communication library can perform optimizations taking link speed into account. In this paper, we propose a framework to automatically detect the topology and speed of an InfiniBand network and make it available to users through an easy to use interface. We also make design changes inside the MPI library to dynamically query this topology detection service and to form a topology model of the underlying network. We have redesigned the broadcast algorithm to take into account this network topology information and dynamically adapt the communication pattern to best fit the characteristics of the underlying network. To the best of our knowledge, this is the first such work for InfiniBand clusters. Our experimental results show that, for large homogeneous systems and large message sizes, we get up to 14% improvement in the latency of the broadcast operation using our proposed network topology-aware scheme over the default scheme at the micro-benchmark level. At the application level, the proposed framework delivers up to 8% improvement in total application run-time especially as job size scales up. The proposed network speed-aware algorithms are able to attain micro-benchmark performance on the heterogeneous SDR-DDR InfiniBand cluster to perform on par with runs on the DDR only portion of the cluster for small to medium sized messages. We also demonstrate that the network speed aware algorithms perform 70% to 100% better than the naive algorithms when both are run on the heterogeneous SDR-DDR InfiniBand cluster.

international conference on management of data | 2006

Scientific formats for object-relational database systems: a study of suitability and performance

Shirley Cohen; Patrick Hurley; Karl W. Schulz; William L. Barth; Brad Benton

Commercial database management systems (DBMSs) have historically seen very limited use within the scientific computing community. One reason for this absence is that previous database systems lacked support for the extensible data structures and performance features required within a high-performance computing context. However, database vendors have recently enhanced the functionality of their systems by adding object extensions to the relational engine. In principle, these extensions allow for the representation of a rich collection of scientific datatypes and common statistical operations. Utilizing these new extensions, this paper presents a study of the suitability of incorporating two popular scientific formats, NetCDF and HDF, into an object-relational system. To assess the performance of the database approach, a series of solution variables from a regional weather forecast model are used to build representative small, medium and large databases. Common statistical operations and array element queries are then performed using the object-relational database, and the execution timings are compared against native NetCDF and HDF operations.

Engineering With Computers | 2013

MASA: a library for verification using manufactured and analytical solutions

Nicholas Malaya; Kemelli C. Estacio-Hiroms; Roy H. Stogner; Karl W. Schulz; Paul T. Bauman; Graham F. Carey

In this paper we introduce the Manufactured Analytical Solution Abstraction (MASA) library for applying the method of manufactured solutions to the verification of software used for solving a large class of problems stemming from numerical methods in mathematical physics including nonlinear equations, systems of algebraic equations, and ordinary and partial differential equations. We discuss the process of scientific software verification, manufactured solution generation using symbolic manipulation with computer algebra systems such as Maple™ or SymPy, and automatic differentiation for forcing function evaluation. We discuss a hierarchic methodology that can be used to alleviate the combinatorial complexity in generating symbolic manufactured solutions for systems of equations based on complex physics. Finally, we detail the essential features and examples of the Application Programming Interface behind MASA, an open source library designed to act as a central repository for manufactured and analytical solutions over a diverse range of problems.

ieee international conference on high performance computing data and analytics | 2011

Best practices for the deployment and management of production HPC clusters

Robert T. McLay; Karl W. Schulz; William L. Barth; Tommy Minyard

Commodity-based Linux HPC clusters dominate the scientific computing landscape in both academia and industry ranging from small research clusters to petascale supercomputers supporting thousands of users. To support broad user communities and manage a user-friendly environment, end-user sites must combine a range of low-level system soft ware with multiple compiler chains, support libraries, and a suite of 3rd party applications. In addition, large sys tems require bare metal provisioning and a flexible software management strategy to maintain consistency and upgrade ability across thousands of compute nodes. This report documents a Linux operating system framework, (LosF), which has evolved over the last seven years to provide an integrated strategy for the deployment of multiple HPC systems at the Texas Advanced Computing Center. Documented within this effort is the high-level cluster configuration options and definitions, bare-metal provisioning, hierarchical HPC soft ware stack design, package-management, user environment management tools, user account synchronization, and local customization configurations.

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models | 2014

Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models

Jithin Jose; Sreeram Potluri; Hari Subramoni; Xiaoyi Lu; Khaled Hamidouche; Karl W. Schulz; Hari Sundar; Dhabaleswar K. Panda

While Hadoop holds the current Sort Benchmark record, previous research has shown that MPI-based solutions can deliver similar performance. However, most existing MPI-based designs rely on two-sided communication semantics. The emerging Partitioned Global Address Space (PGAS) programming model presents a flexible way to express parallelism for data-intensive applications. However, not all portions of the data analytics applications are amenable to conversion using PGAS models. In this study, we propose a novel design of the out-of-core, k-way parallel sort algorithm that takes advantage of the features of both MPI and OpenSHMEM PGAS models. To the best of our knowledge, this is the first design of any data intensive computing application using Hybrid MPI + PGAS models. Our experimental evaluation indicates that our proposed framework outperforms existing MPI-based design by up to 45% at 8,192 processes. It also achieves 7X improvement over Hadoop-based sort using the same amount of resources at 1,024 cores.

Journal of Spacecraft and Rockets | 2011

Loose-coupling algorithm for simulating hypersonic flows with radiation and ablation

Paul T. Bauman; Roy H. Stogner; Graham F. Carey; Karl W. Schulz; Rochan Upadhyay; Andre Maurente

Aprocedure has been developed to couple a hypersonic reacting flowmodel, a radiative heat transfermodel, and a surface ablation model to study the surface heat transfer and surface ablation rate of atmospheric reentry vehicles. The two-way loose-coupling algorithm is described for each of the models, as is the solution procedure to achieve convergence. Observations on the challenges of the loose-coupling strategy are given. Representative results are presented for two-dimensional benchmark examples and for three-dimensional flow at an angle of attack past a symmetric capsule based on the Crew Exploration Vehicle reentry vehicle. Effects due to the interaction with radiation and ablation are shown for two quantities of interest: the predicted peak surface heat flux and the ablation rate on the vehicle heat shield. Uncertain parameters are identified in each of the submodels, and a preliminary parameter sensitivity study is carried out by varying these values to examine their effects on the heat transfer and ablation rates in the coupled problem.

48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition | 2010

On the (in)validation of a thermochemical model with EAST shock tube radiation measurements

Kenji Miki; Marco Panesi; Ernesto E. Prudencio; Andre Maurente; Sai Hung Cheung; Jeremy Jagodzinski; David B. Goldstein; Serge Prudhomme; Karl W. Schulz; Chris Simmons; James S. Strand; Philip L. Varghese

Kenji Miki∗, Marco Panesi∗, Ernesto E. Prudencio†, Andre Maurente∗, Sai Hung Cheung∗, Jeremy Jagodzinski∗, David Goldstein‡, Serge Prudhomme§, Karl Schulz¶, Chris Simmons‖, James Strand∗∗ , and Philip Varghese†† Center for Predictive Engineering and Computational Sciences (PECOS), Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin, 1 University Station C0200, Austin, Texas 78712, USA

Explore More