Matthew J. Sottile | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew J. Sottile is active.

Explore More

Publication

Featured researches published by Matthew J. Sottile.

international conference on cluster computing | 2002

Supermon: a high-speed cluster monitoring system

Matthew J. Sottile; Ronald Minnich

Supermon is a flexible set of tools for high speed, scalable cluster monitoring. Node behavior can be monitored much faster than with other commonly used methods (e.g., rstatd). In addition, Supermon uses a data protocol based on symbolic expressions (S-expressions) at all levels of Supermon, from individual nodes to entire clusters. This contributes to Supermons scalability and allows it to function in a heterogeneous environment. This paper presents the Supermon architecture and discuss initial performance measurements on a cluster of heterogeneous Alpha-processor based nodes.

conference on high performance computing (supercomputing) | 2007

The ghost in the machine: observing the effects of kernel operation on parallel application performance

Aroon Nataraj; Alan Morris; Allen D. Malony; Matthew J. Sottile; Peter H. Beckman

The performance of a parallel application on a scalable HPC system is determined by user-level execution of the application code are system-level (OS kernel) operations. To understand the influences of system-level factors on application performance, the measurement of OS kernel activities is key. We describe a technology to observe kernel actions and make this information available to application-level performance measurement tools. The benefits of merged application and OS performance information and its use in parallel performance analysis are demonstrated, both for profiling and tracing methodologies. In particular, we focus on the problem of kernel noise assessment as a stress test of the approach. We show new results for characterizing noise and introduce new techniques for evaluating noise interference and its effects on application execution. Our kernel measurement and noise analysis technologies are being developed as part of Linux OS environments for scalable parallel systems.

international parallel and distributed processing symposium | 2006

Performance analysis of parallel programs via message-passing graph traversal

Matthew J. Sottile; Vaddadi P. Chandu; David A. Bader

The ability to understand the factors contributing to parallel program performance are vital for understanding the impact of machine parameters on the performance of specific applications. We propose a methodology for analyzing the performance characteristics of parallel programs based, on message-passing traces of their execution on a set of processors. Using this methodology, we explore how perturbations in both single processor performance and the messaging layer impact the performance of the traced run. This analysis provides a quantitative description of the sensitivity of applications to a variety of performance parameters to better understand the range of systems upon which an application can be expected to perform well. These performance parameters include operating system, interference and variability in message latencies within the interconnection network layer.

european conference on parallel processing | 2007

TA UoverSupermon : low-overhead online parallel performance monitoring

Aroon Nataraj; Matthew J. Sottile; Alan Morris; Allen D. Malony; Sameer Shende

Online application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. As HPC systems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. We adapt and combine two existing, mature systems - TAU and Supermon - to address this problem. TAU performs the measurement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach leads to very lowoverhead application monitoring as well as other benefits unavailable from using a transport such as NFS.

Concurrency and Computation: Practice and Experience | 2005

Performance technology for parallel and distributed component software

Allen D. Malony; Sameer Shende; Nick Trebon; Jaideep Ray; Robert C. Armstrong; Craig Edward Rasmussen; Matthew J. Sottile

This work targets the emerging use of software component technology for high‐performance scientific parallel and distributed computing. While component software engineering will benefit the construction of complex science applications, its use presents several challenges to performance measurement, analysis, and optimization. The performance of a component application depends on the interaction (possibly nonlinear) of the composed component set. Furthermore, a component is a ‘binary unit of composition’ and the only information users have is the interface the component provides to the outside world. A performance engineering methodology and development approach is presented to address evaluation and optimization issues in high‐performance component environments. We describe a prototype implementation of a performance measurement infrastructure for the Common Component Architecture (CCA) system. A case study demonstrating the use of this technology for integrated measurement, monitoring, and optimization in CCA component‐based applications is given. Copyright

The Journal of Supercomputing | 2006

Rapid prototyping frameworks for developing scientific applications: A case study

Christopher D. Rickett; Sung-Eun Choi; Craig Edward Rasmussen; Matthew J. Sottile

In this paper, we describe a Python-based framework for the rapid prototyping of scientific applications. A case study was performed using a problem specification developed for Marmot, a project at the Los Alamos National Laboratory aimed at re-factoring standard physics codes into reusable and extensible components. Components were written in Python, ZPL, Fortran, and C++ following the Marmot component design. We evaluate our solution both qualitatively and quantitatively by comparing it to a single-language version written in C.

ieee international conference on high performance computing data and analytics | 1999

Computational experiments using distributed tools in a Web-based electronic notebook environment

Allen D. Malony; Jenifer L. Skidmore; Matthew J. Sottile

Computational environments used by scientists should provide high-level support for scientific processes that involve the integrated and systematic use of familiar abstractions from a laboratory setting, including notebooks, instruments, experiments, and analysis tools. However, doing so while hiding the complexities of the underlying computational platform is a challenge. ViNE is a web-based electronic notebook that implements a high-level interface for applying computational tools in scientific experiments in a location- and platform-independent manner. Using ViNE, a scientist can specify data and tools and construct experiments that apply them in well-defined procedures. ViNEs implementation of the experiment abstraction offers the scientist easy-to-understand framework for building scientific processes. This paper discusses how ViNE implements computational experiments in distributed, heterogeneous computing environments.

european conference on parallel processing | 2004

Co-array Python: A Parallel Extension to the Python Language

Craig Edward Rasmussen; Matthew J. Sottile; Jarek Nieplocha; Robert W. Numrich; Eric Jones

A parallel extension to the Python language is introduced that is modeled after the Co-Array Fortran extensions to Fortran 95. A new Python module, CoArray, has been developed to provide co-array syntax that allows a Python programmer to address co-array data on a remote processor. An example of Jacobi iteration using the CoArray module is shown and corresponding performance results are presented.

high performance computing systems and applications | 2002

Life with Ed: a case study of a linux BIOS/BProc cluster

Sung-Eun Choi; Erik Hendriks; Ronald Minnich; Matthew J. Sottile; Aaron Marks

In this paper, we describe experiences with our 127-node/161-processor Alpha cluster estbed, Ed. Ed is unique for two distinct reasons. First, we have replaced the standard BIOS on the cluster nodes with the Linux BIOS which loads Linux directly from non-volatile memory (Flash RAM). Second, the operating system provides a single-system image of the entire cluster, much like a traditional supercomputer. We will discuss the advantages of such a cluster, including time to boot (101 seconds for 100 nodes), upgrade (same as time to boot), and start processes (2.4 seconds for 15,000 processes). Additionally, we have discovered that certain predictions about the nature ofter a scale clusters, such as the need for hierrchical structure, are false. Finally, we argue that to achieve true scalability, terascale clusters must be built in the way of Ed.

computational science and engineering | 2013

ForOpenCL: transformations exploiting array syntax in Fortran for accelerator programming

Matthew J. Sottile; Craig Edward Rasmussen; Wayne Weseloh; Robert W. Robey; Daniel J. Quinlan; Jeffrey Overbey

Emerging GPU architectures for high performance computing are well suited to a data-parallel programming model. This paper presents preliminary work examining a programming methodology that provides Fortran programmers with access to these emerging systems. We use array constructs in Fortran to show how this infrequently exploited, standardised language feature is easily transformed to lower-level accelerator code. The transformations in ForOpenCL are based on a simple mapping from Fortran to OpenCL. We demonstrate, using a stencil code solving the shallow-water fluid equations, that the performance of the ForOpenCL compiler-generated transformations is comparable with that of hand-optimised OpenCL code.

Explore More