William L. Barth | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William L. Barth is active.

Explore More

Publication

Featured researches published by William L. Barth.

ieee international conference on high performance computing data and analytics | 2014

Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers

Alexander Heinecke; Alexander Breuer; Sebastian Rettenberger; Michael Bader; Alice-Agnes Gabriel; Christian Pelties; Arndt Bode; William L. Barth; Xiangke Liao; Karthikeyan Vaidyanathan; Mikhail Smelyanskiy; Pradeep Dubey

We present an end-to-end optimization of the innovative Arbitrary high-order DERivative Discontinuous Galerkin (ADER-DG) software SeisSol targeting Intel® Xeon Phi coprocessor platforms, achieving unprecedented earthquake model complexity through coupled simulation of full frictional sliding and seismic wave propagation. SeisSol exploits unstructured meshes to flexibly adapt for complicated geometries in realistic geological models. Seismic wave propagation is solved simultaneously with earthquake faulting in a multiphysical manner leading to a heterogeneous solver structure. Our architecture aware optimizations deliver up to 50% of peak performance, and introduce an efficient compute-communication overlapping scheme shadowing the multiphysics computations. SeisSol delivers near-optimal weak scaling, reaching 8.6 DP-PFLOPS on 8,192 nodes of the Tianhe-2 supercomputer. Our performance model projects reaching 18 -- 20 DP-PFLOPS on the full Tianhe-2 machine. Of special relevance to modern civil engineering needs, our pioneering simulation of the 1992 Landers earthquake shows highly detailed rupture evolution and ground motion at frequencies up to 10 Hz.

international conference on supercomputing | 2010

Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application

Sreeram Potluri; Ping Lai; Karen Tomko; Sayantan Sur; Yifeng Cui; Mahidhar Tatineni; Karl W. Schulz; William L. Barth; Amitava Majumdar; Dhabaleswar K. Panda

AWM-Olsen is a widely used ground motion simulation code based on a parallel finite difference solution of the 3-D velocity-stress wave equation. This application runs on tens of thousands of cores and consumes several million CPU hours on the TeraGrid Clusters every year. A significant portion of its run-time (37% in a 4,096 process run), is spent in MPI communication routines. Hence, it demands an optimized communication design coupled with a low-latency, high-bandwidth network and an efficient communication subsystem for good performance. In this paper, we analyze the performance bottlenecks of the application with regard to the time spent in MPI communication calls. We find that much of this time can be overlapped with computation using MPI non-blocking calls. We use both two-sided and MPI-2 one-sided communication semantics to re-design the communication in AWM-Olsen. We find that with our new design, using MPI-2 one-sided communication semantics, the entire application can be sped up by 12% at 4K processes and by 10% at 8K processes on a state-of-the-art InfiniBand cluster, Ranger at the Texas Advanced Computing Center (TACC).

international conference on management of data | 2006

Scientific formats for object-relational database systems: a study of suitability and performance

Shirley Cohen; Patrick Hurley; Karl W. Schulz; William L. Barth; Brad Benton

Commercial database management systems (DBMSs) have historically seen very limited use within the scientific computing community. One reason for this absence is that previous database systems lacked support for the extensible data structures and performance features required within a high-performance computing context. However, database vendors have recently enhanced the functionality of their systems by adding object extensions to the relational engine. In principle, these extensions allow for the representation of a rich collection of scientific datatypes and common statistical operations. Utilizing these new extensions, this paper presents a study of the suitability of incorporating two popular scientific formats, NetCDF and HDF, into an object-relational system. To assess the performance of the database approach, a series of solution variables from a regional weather forecast model are used to build representative small, medium and large databases. Common statistical operations and array element queries are then performed using the object-relational database, and the execution timings are compared against native NetCDF and HDF operations.

ieee international conference on high performance computing data and analytics | 2014

Comprehensive resource use monitoring for HPC systems with TACC stats

R. Todd Evans; William L. Barth; James C. Browne; Robert L. DeLeon; Thomas R. Furlani; Steven M. Gallo; Matthew D. Jones; Abani K. Patra

This paper reports on a comprehensive, fully automated resource use monitoring package, TACC Stats, which enables both consultants, users and other stakeholders in an HPC system to systematically and actively identify jobs/applications that could benefit from expert support and to aid in the diagnosis of software and hardware issues. TACC Stats continuously collects and analyzes resource usage data for every job run on a system and differs significantly from conventional profilers because it requires no action on the part of the user or consultants -- it is always collecting data on every node for every job. TACC Stats is open source and downloadable, configurable and compatible with general Linux-based computing platforms, and extensible to new CPU architectures and hardware devices. It is meant to provide a comprehensive resource usage monitoring solution. In addition to describing TACC Stats, the paper illustrates its application to identifying production jobs which have inefficient resource use characteristics.

Numerical Heat Transfer Part B-fundamentals | 2006

On a natural-convection benchmark problem in non-Newtonian fluids

William L. Barth; Graham F. Carey

Computational results are compared to experimental benchmark results for natural convection of a Newtonian fluid in a cubical cavity. These results are then extended to Powell-Eyring and extended Williamson fluids. Good agreement is seen between the experimental and computational results for most of the Newtonian cases. Results from the non-Newtonian cases are proposed for comparison to future experiments. An issue raised by the Newtonian experimental study is identified and a probable resolution is described. Comparison of Newtonian and non-Newtonian results shows increased heat flux with increasing nonlinearity of the models. The non-Newtonian cases for the diagonally inclined orientation also show periodic behavior not seen in the Newtonian cases.

ieee international conference on high performance computing data and analytics | 2011

Best practices for the deployment and management of production HPC clusters

Robert T. McLay; Karl W. Schulz; William L. Barth; Tommy Minyard

Commodity-based Linux HPC clusters dominate the scientific computing landscape in both academia and industry ranging from small research clusters to petascale supercomputers supporting thousands of users. To support broad user communities and manage a user-friendly environment, end-user sites must combine a range of low-level system soft ware with multiple compiler chains, support libraries, and a suite of 3rd party applications. In addition, large sys tems require bare metal provisioning and a flexible software management strategy to maintain consistency and upgrade ability across thousands of compute nodes. This report documents a Linux operating system framework, (LosF), which has evolved over the last seven years to provide an integrated strategy for the deployment of multiple HPC systems at the Texas Advanced Computing Center. Documented within this effort is the high-level cluster configuration options and definitions, bare-metal provisioning, hierarchical HPC soft ware stack design, package-management, user environment management tools, user account synchronization, and local customization configurations.

ieee international conference on high performance computing data and analytics | 2013

Enabling comprehensive data-driven system management for large computational facilities

James C. Browne; Robert L. DeLeon; Charng-Da Lu; Matthew D. Jones; Steven M. Gallo; Amin Ghadersohi; Abani K. Patra; William L. Barth; John Hammond; Thomas R. Furlani; Robert T. McLay

This paper presents a tool chain, based on the open source tool TACC_Stats, for systematic and comprehensive job level resource use measurement for large cluster computers, and its incorporation into XDMoD, a reporting and analytics framework for resource management that targets meeting the information needs of users, application developers, systems administrators, systems management and funding managers. Accounting, scheduler and event logs are integrated with system performance data from TACC_Stats. TACC_Stats periodically records resource use including many hardware counters for each job running on each node. Furthermore, system level metrics are obtained through aggregation of the node (job) level data. Analysis of this data generates many types of standard and custom reports and even a limited predictive capability that has not previously been available for open-source, Linux-based software systems. This paper presents case studies of information that can be applied for effective resource management. We believe this system to be the first fully comprehensive system for supporting the information needs of all stakeholders in open-source software based HPC systems.

ieee international conference on high performance computing data and analytics | 2014

A user-friendly approach for tuning parallel file operations

Robert T. McLay; Doug James; Si Liu; John Cazes; William L. Barth

The Lustre file system provides high aggregated I/O bandwidth and is in widespread use throughout the HPC community. Here we report on work (1) developing a model for understanding collective parallel MPI write operations on Lustre, and (2) producing a library that optimizes parallel write performance in a user-friendly way. We note that a systems default stripe count is rarely a good choice for parallel I/O, and that performance depends on a delicate balance between the number of stripes and the actual (not requested) number of collective writers. Unfortunate combinations of these parameters may degrade performance considerably. For the programmer, however, its all about the stripe count: an informed choice of this single parameter allows MPI to assign writers in a way that achieves near-optimal performance. We offer recommendations for those who wish to tune performance manually and describe the easy-to-use T3PIO library that manages the tuning automatically.

extreme science and engineering discovery environment | 2014

An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats

Joseph P. White; Robert L. DeLeon; Thomas R. Furlani; Steven M. Gallo; Matthew D. Jones; Amin Ghadersohi; Cynthia D. Cornelius; Abani K. Patra; James C. Browne; William L. Barth; John Hammond

When a user requests less than a full node for a job on XSEDEs large resources - Stampede and Lonestar4 -, that is less than 16 cores on Stampede or 12 cores on Lonestar4, they are assigned a full node by policy. Although the actual CPU hours consumed by these jobs is small when compared to the total CPU hours delivered by these resources, they do represent a substantial fraction of the total number of jobs (~18% for Stampede and ~15% for Lonestar4 between January and February 2014). Academic HPC centers, such as the Center for Computational Research (CCR) at the University at Buffalo, SUNY typically have a much larger proportion of small jobs than the large XSEDE systems. For CCRs production cluster, Rush, the decision was made to allow the allocation of simultaneous jobs on the same node. This greatly increases the overall throughput but also raises questions whether the jobs that share the same node will interfere with one another. We present here an analysis that explores this issue using data from Rush, Stampede and Lonestar4. Analysis of usage data indicates little interference.

Scopus | 2014

Comprehensive, open-source resource usage measurement and analysis for HPC systems

James C. Browne; Robert L. DeLeon; Abani K. Patra; William L. Barth; John Hammond; Jones; Tom Furlani; Barry I. Schneider; Steven M. Gallo; Amin Ghadersohi; Ryan J. Gentner; Jeffrey T. Palmer; Nikolay Simakov; Martins Innus; Andrew E. Bruno; Joseph P. White; Cynthia D. Cornelius; Thomas Yearke; Kyle Marcus; G. Von Laszewski; Fugang Wang

The important role high‐performance computing (HPC) resources play in science and engineering research, coupled with its high cost (capital, power and manpower), short life and oversubscription, requires us to optimize its usage – an outcome that is only possible if adequate analytical data are collected and used to drive systems management at different granularities – job, application, user and system. This paper presents a method for comprehensive job, application and system‐level resource use measurement, and analysis and its implementation. The steps in the method are system‐wide collection of comprehensive resource use and performance statistics at the job and node levels in a uniform format across all resources, mapping and storage of the resultant job‐wise data to a relational database, which enables further implementation and transformation of the data to the formats required by specific statistical and analytical algorithms. Analyses can be carried out at different levels of granularity: job, user, application or system‐wide. Measurements are based on a new lightweight job‐centric measurement tool ‘TACC_Stats’, which gathers a comprehensive set of resource use metrics on all compute nodes and data logged by the system scheduler. The data mapping and analysis tools are an extension of the XDMoD project. The method is illustrated with analyses of resource use for the Texas Advanced Computing Centers Lonestar4, Ranger and Stampede supercomputers and the HPC cluster at the Center for Computational Research. The illustrations are focused on resource use at the system, job and application levels and reveal many interesting insights into system usage patterns and also anomalous behavior due to failure/misuse. The method can be applied to any system that runs the TACC_Stats measurement tool and a tool to extract job execution environment data from the system scheduler. Copyright

Explore More