Robert T. McLay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert T. McLay is active.

Explore More

Publication

Featured researches published by Robert T. McLay.

Frontiers in Plant Science | 2011

The iPlant Collaborative: Cyberinfrastructure for Plant Biology

Stephen A. Goff; Matthew W. Vaughn; Sheldon J. McKay; Eric Lyons; Ann E. Stapleton; Damian Gessler; Naim Matasci; Liya Wang; Matthew R. Hanlon; Andrew Lenards; Andy Muir; Nirav Merchant; Sonya Lowry; Stephen A. Mock; Matthew Helmke; Adam Kubach; Martha L. Narro; Nicole Hopkins; David Micklos; Uwe Hilgert; Michael Gonzales; Chris Jordan; Edwin Skidmore; Rion Dooley; John Cazes; Robert T. McLay; Zhenyuan Lu; Shiran Pasternak; Lars Koesterke; William H. Piel

The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanitys projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services.

international conference on cluster computing | 2011

Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters

Hari Subramoni; Krishna Chaitanya Kandalla; Jérôme Vienne; Sayantan Sur; Bill Barth; Karen Tomko; Robert T. McLay; Karl W. Schulz; Dhabaleswar K. Panda

It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexities of making the application performance network topology agnostic is hidden from the end user. Similarly, the rapid improvements in networking technology and speed are resulting in many commodity clusters becoming heterogeneous, with respect to networking speed. For example, switches and adapters belonging to different generations (SDR - 8 Gbps, DDR - 16 Gbps and QDR - 36 Gbps speeds in InfiniBand) are integrated into a single system. This leads to an additional challenge to make the communication library aware of the performance implications of heterogeneous link speeds. Accordingly, the communication library can perform optimizations taking link speed into account. In this paper, we propose a framework to automatically detect the topology and speed of an InfiniBand network and make it available to users through an easy to use interface. We also make design changes inside the MPI library to dynamically query this topology detection service and to form a topology model of the underlying network. We have redesigned the broadcast algorithm to take into account this network topology information and dynamically adapt the communication pattern to best fit the characteristics of the underlying network. To the best of our knowledge, this is the first such work for InfiniBand clusters. Our experimental results show that, for large homogeneous systems and large message sizes, we get up to 14% improvement in the latency of the broadcast operation using our proposed network topology-aware scheme over the default scheme at the micro-benchmark level. At the application level, the proposed framework delivers up to 8% improvement in total application run-time especially as job size scales up. The proposed network speed-aware algorithms are able to attain micro-benchmark performance on the heterogeneous SDR-DDR InfiniBand cluster to perform on par with runs on the DDR only portion of the cluster for small to medium sized messages. We also demonstrate that the network speed aware algorithms perform 70% to 100% better than the naive algorithms when both are run on the heterogeneous SDR-DDR InfiniBand cluster.

Proceedings of the First International Workshop on HPC User Support Tools | 2014

Modern scientific software management using EasyBuild and Lmod

Markus Geimer; Kenneth Hoste; Robert T. McLay

HPC user support teams invest a lot of time and effort in installing scientific software for their users. A well-established practice is providing environment modules to make it easy for users to set up their working environment. Several problems remain, however: user support teams lack appropriate tools to manage a scientific software stack easily and consistently, and users still struggle to set up their working environment correctly. In this paper, we present a modern approach to installing (scientific) software that provides a solution to these common issues. We show how EasyBuild, a software build and installation framework, can be used to automatically install software and generate environment modules. By using a hierarchical module naming scheme to offer environment modules to users in a more structured way, and providing Lmod, a modern tool for working with environment modules, we help typical users avoid common mistakes while giving power users the flexibility they demand.

ieee international conference on high performance computing data and analytics | 2011

Best practices for the deployment and management of production HPC clusters

Robert T. McLay; Karl W. Schulz; William L. Barth; Tommy Minyard

Commodity-based Linux HPC clusters dominate the scientific computing landscape in both academia and industry ranging from small research clusters to petascale supercomputers supporting thousands of users. To support broad user communities and manage a user-friendly environment, end-user sites must combine a range of low-level system soft ware with multiple compiler chains, support libraries, and a suite of 3rd party applications. In addition, large sys tems require bare metal provisioning and a flexible software management strategy to maintain consistency and upgrade ability across thousands of compute nodes. This report documents a Linux operating system framework, (LosF), which has evolved over the last seven years to provide an integrated strategy for the deployment of multiple HPC systems at the Texas Advanced Computing Center. Documented within this effort is the high-level cluster configuration options and definitions, bare-metal provisioning, hierarchical HPC soft ware stack design, package-management, user environment management tools, user account synchronization, and local customization configurations.

Journal of Parallel and Distributed Computing | 1996

Maximizing Sparse Matrix-Vector Product Performance on RISC Based MIMD Computers

Robert T. McLay; Spencer Swift; Graham F. Carey

The matrix?vector product kernel can represent most of the computation in a gradient iterative solver. Thus, an efficient solver requires that the matrix?vector product kernel be fast. We show that standard approaches with Fortran or C may not deliver good performance and present a strategy involving managing the cache to improve the performance. As an example, using this approach we demonstrate that it is possible to achieve 2.5 times better performance over a Fortran implementation with an assembly coded kernel on an Intel i860. These issues are of general interest for all computer architectures but are particularly important for users of MIMD computers to achieve a useful fraction of the advertised peak performance of these machines.

ieee international conference on high performance computing data and analytics | 2013

Enabling comprehensive data-driven system management for large computational facilities

James C. Browne; Robert L. DeLeon; Charng-Da Lu; Matthew D. Jones; Steven M. Gallo; Amin Ghadersohi; Abani K. Patra; William L. Barth; John Hammond; Thomas R. Furlani; Robert T. McLay

This paper presents a tool chain, based on the open source tool TACC_Stats, for systematic and comprehensive job level resource use measurement for large cluster computers, and its incorporation into XDMoD, a reporting and analytics framework for resource management that targets meeting the information needs of users, application developers, systems administrators, systems management and funding managers. Accounting, scheduler and event logs are integrated with system performance data from TACC_Stats. TACC_Stats periodically records resource use including many hardware counters for each job running on each node. Furthermore, system level metrics are obtained through aggregation of the node (job) level data. Analysis of this data generates many types of standard and custom reports and even a limited predictive capability that has not previously been available for open-source, Linux-based software systems. This paper presents case studies of information that can be applied for effective resource management. We believe this system to be the first fully comprehensive system for supporting the information needs of all stakeholders in open-source software based HPC systems.

ieee international conference on high performance computing data and analytics | 2014

A user-friendly approach for tuning parallel file operations

Robert T. McLay; Doug James; Si Liu; John Cazes; William L. Barth

The Lustre file system provides high aggregated I/O bandwidth and is in widespread use throughout the HPC community. Here we report on work (1) developing a model for understanding collective parallel MPI write operations on Lustre, and (2) producing a library that optimizes parallel write performance in a user-friendly way. We note that a systems default stripe count is rarely a good choice for parallel I/O, and that performance depends on a delicate balance between the number of stripes and the actual (not requested) number of collective writers. Unfortunate combinations of these parameters may degrade performance considerably. For the programmer, however, its all about the stripe count: an informed choice of this single parameter allows MPI to assign writers in a way that achieves near-optimal performance. We offer recommendations for those who wish to tune performance manually and describe the easy-to-use T3PIO library that manages the tuning automatically.

Proceedings of the Second International Workshop on HPC User Support Tools | 2015

Community use of XALT in its first year in production

Reuben D. Budiardja; Mark R. Fahey; Robert T. McLay; Prasad Maddumage Don; Bilel Hadri; Doug James

XALT collects accurate, detailed, and continuous job-level and link-time data and stores that data in a database; all the data collection is transparent to the users. The data stored can be mined to generate a picture of the compilers, libraries, and other software that users need to run their jobs successfully, highlighting the products that researchers use. We showcase how data collected by XALT can be easily mined into a digestible format by presenting data from four separate HPC centers. XALT is already used by many HPC centers around the world due to its usefulness and complementariness to existing logs and databases. Centers with XALT have a much better understanding of library and executable usage and patterns. We also present new functionality in XALT - namely the ability to anonymize data and early work in providing seamless access to provenance data.

Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale | 2016

A web interface for XALT log data analysis

Ruizhu Huang; Weijia Xu; Robert T. McLay

XALT is a job-monitoring tool to collect accurate, detailed, and continuous job level and link-time data on all MPI jobs running on a computing cluster. Due to its usefulness and complementariness to other system logs, XALT has been deployed on Stampede at Texas Advanced Computing Center and other high performance computing resources around the world. The data collected by XALT can be extremely valuable to help resource providers understanding resources usages and identify patterns and insights for future improvements. However, the volume of data collected by XALT grows quickly over time on large system and presents challenges for access and analysis. In this paper, we describe development of a prototype tool to analyze and visualize XALT data. The application utilizes Spark for efficient data processing over large volume of log data and R for interactive visualization of the results over the web. The application provides an easy to use interface for users to conveniently share and communicate executable usage and patterns without prerequisite knowledge on big data technology. In this paper, we detail the features of this tool, current development status, performance evaluation and exemplar use cases.

conference on high performance computing (supercomputing) | 1997

MPP Solution of Rayleigh - Bénard - Marangoni Flows

Graham F. Carey; Christopher Harle; Robert T. McLay; Spencer Swift

A domain decomposition strategy and parallel gradient-type iterative solution scheme have been developed and implemented for computation of complex 3D viscous flow problems involving heat transfer and surface tension effects. Special attention has been paid to the kernels for the computationally intensive matrix-vector products and dot products, to memory management, and to overlapping communication and computation. Details of these implementation issues are described together with associated performance and scalability studies. Representative Rayleigh- Bénard and microgravity Marangoni flow calculations on the Cray T3D are presented, and performance results verifying a sustained rate in excess of 16 gigaflops on 512 nodes of the T3D have been obtained. The work is currently being extended to the T3E and we have begun carrying out further performance benchmarks and scalability studies on this platform. Preliminary performance studies have recently been carried out and sustained rates above 50 gigaflops and 100 gigaflops have been achieved on the 512 node T3E-600 and 1024 node T3E-900 configurations respectively.

Explore More