David Cronk
University of Tennessee
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Cronk.
european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2001
Shirley Moore; David Cronk; Kevin S. London; Jack J. Dongarra
In order to produce MPI applications that perform well on todays parallel architectures, programmers need effective tools for collecting and analyzing performance data. A variety of such tools, both commercial and research, are becoming available. This paper reviews and evaluations the available cross-platform MPI performance analysis tools.
hpcmp users group conference | 2005
Shirley Moore; David Cronk; Felix Wolf; Avi Purkayastha; Patricia J. Teller; Robert Araiza; Maria Gabriela Aguilera; Jamie Nava
Large scientific applications developed as recently as five to ten years ago are often at a disadvantage in current computing environments. Due to frequent acquisition decisions made for reasons such as priceperformance, in order to continue production runs it is often necessary to port large scientific applications to completely different architectures than the ones on which they were developed. Since the porting step does not include optimizations necessary for the new architecture, performance often suffers due to various architectural features. The Programming Environment and Training (PET) Computational Environments (CE) team has developed and deployed different procedures and mechanisms for collection of performance data and for profiling and optimizations of these applications based on that data. The paper illustrates some of these procedures and mechanisms.
ieee international conference on high performance computing data and analytics | 2009
Wyatt Spear; Sameer Shende; Allen D. Malony; Ricardo Portillo; Patricia J. Teller; David Cronk; Shirley Moore; Dan Terpstra
Although there are a number of performance tools available to Department of Defense (DoD) users, the process of performance analysis and tuning has yet to become an integral part of the DoD software development cycle. Instead, performance analysis and tuning is the domain of a small number of experts who cannot possibly address all the codes that need attention. We believe the main reasons for this are a lack of knowledge about these tools, the real or perceived steep learning curve required to use them, and the absence of a centralized method that incorporates their use in the software development cycle. This paper presents ongoing efforts to enable a larger number of DoD High Performance Computing Modernization Program (HPCMP) users to benefit from available performance analysis tools by integrating them into the Eclipse Parallel Tools Platform (Eclipse/PTP), an integrated development environment for parallel programs.
ieee international conference on high performance computing data and analytics | 2007
Sameer Shende; Allen D. Malony; Shirley Moore; David Cronk
There is a growing awareness that high-end performance evaluation and tuning requires holistic program analysis. In addition to CPU performance characterization, observation of memory, network, and input/output (I/O) performance can help to identify execution bottlenecks related to these factors. Correctness of memory and communication operations is also an issue and can affect performance indirectly. This paper describes extensions to the TAU performance system to incorporate direct source-level code instrumentation for tracking dynamic memory management in FORTRAN codes that use allocate and de-allocate statements. TAUs lightweight profiling can then generate a detailed report of memory usage including the sizes of memory blocks allocated and de-allocated with precise program attribution: variable name, source line number, and file name. We report on results and experiences in applying TAU to the PTURBO application.
hpcmp users group conference | 2006
Shirley Moore; David Cronk; Sameer; Allen D. Malony
Performance of computationally intensive applications often depends critically on the floating point and memory performance of nested loop structures, this paper describes extensions to the Tuning Analysis and Utilities (TAU) parallel performance system that implement automated of parallel C/C++ and Fortran programs to collect loop-level profile data. Link-time and run-time options for configuring the instrumented version of the code to perform various types of measurements, such as time and hardware counter based profiling are described. Finally, examples are given of collecting and analyzing loop-level profile data for several DoD applications
2004 Users Group Conference (DOD_UGC'04) | 2004
Daniel M. Pressel; David Cronk; Sameer Shende
A common complaint when dealing with the performance of computationally intensive scientific applications on parallel computers is that programs exist to predict the performance of radar systems, missiles and artillery shells, drugs, etc., but no one knows how to predict the performance of these applications on a parallel computer. Actually, that is not quite true. A more accurate statement is that no one knows how to predict the performance of these applications on a parallel computer in a reasonable amount of time. Penvelope is an attempt to remedy this situation. It is an extension to Amdahls Law/Gustafsons work on scaled speedup that takes into account the cost of interprocessor communication and operating system overhead, yet is simple enough that it was implemented as an Excel spreadsheet.
hpcmp users group conference | 2006
Roberto Araiza; Jaime Nava; Alan Taylor; Patricia J. Teller; David Cronk; Shirley Moore
The analysis of modern, parallelized applications, such as scientific modeling, is of interest to a variety of people within the computing community of the Department of Defense (DoD). Persons desiring insight into the performance of these large programs include application users, application programmers/developers, portfolio and center managers, and others. The analysis needed requires the examination of large data sets obtained from various performance analysis sources including, but not limited to, hardware counters, software event counters, communications event counters, and unrelated instrumentation code inserted into programs. The PCAT (Performance Analysis Team) at the University of Texas-El Paso (UTEP) has developed a suite of tools consisting of a performance database access tool and four different visualization methods to aid diverse DoD users in analyzing certain performance issues associated with serial and, especially, parallel programs. The tools are written in Java and provide multiple views of different aspects of performance metrics associated with a performance database. Preliminary analysis of two different codes resulted in PCAT users identifying possible sources of performance degradation solely from examination of performance metrics, without access to the source code
2005 Users Group Conference (DOD-UGC'05) | 2005
David Cronk; Graham E. Fagg; Susan Emeny; Scot Tucker
Many applications, particularly in the area of Signal and Image Processing (SIP) make use of what is referred to as a pipeline architecture. In these pipelined architectures, data are collected from some source and fed into a system for computation. This system is designed as a pipeline. That is, different processes are responsible for managing the data during different stages of the computation. The data move from one module, or task, to another, as in a pipeline. These types of systems have inherent load balancing problems. During times of low activity the pipeline becomes lightly utilized, wasting many of the resources making up the pipeline. During times of increased activity the pipeline is filled and a backlog develops due to there being insufficient resources to process all the available data. For single pipeline architectures this problem is exacerbated by the fact that data moves through the pipeline at the speed of the slowest process. This paper explores the use of MPI-2s dynamic process management functionality as a method for handling this load imbalance.
Lecture Notes in Computer Science | 1998
David Cronk; Piyush Mehrotra
The use of lightweight threads in a distributed memory environment is becoming common. As distributed lightweight threads have become popular, there has been increased interest in migrating threads across process boundaries. One possible use of thread migration is to perform dynamic load balancing. This paper introduces our implementation of a dynamic load balancing mechanism using thread migration as the means for load redistribution. We provide a brief description of the thread migration mechanism and a detailed description of the load balancing layer. We also present the performance of this load balancing mechanism on a variety of parallel applications.
Archive | 1995
Matthew Haines; Piyush Mehrotra; David Cronk