Steve Karmesin
Los Alamos National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Steve Karmesin.
measurement and modeling of computer systems | 1998
Sameer Shende; Allen D. Malony; Janice E. Cuny; Peter H. Beckman; Steve Karmesin; Kathleen Lindlan
1. Abstract Performance measurement of parallel, objectoriented (00) programs requires the development of instrumentation and analysis techniques beyond those used for more traditional languages. Performance events must be redefined for the conceptual 00 programming model, and those events must be instrumented and tracked in the context of 00 language abstractions, compilation methods, and runtime execution dynamics. In this paper, we focus on the profiling and tracing of C++ applications that have been written using a rich parallel programming framework for highperformance, scientific computing. We address issues of class-based profiling, instrumentation of templates, runtime function identification, and polymorphic (type-based) profiling. Our solutions are implemented in the TAU portable profiling package which also provides support for profiling groups and userlevel timers. We demonstrate TAU’s C++ profiling capabilities for real parallel applications, built from components of the ACTS toolkit. Future directions include work on runtime performance data access, dynamic instrumentation, and higher-level performance data analysis and visualization that relates object semantics with performance execution behavior.
Lecture Notes in Computer Science | 1998
Steve Karmesin; James A. Crotinger; Julian C. Cummings; Scott W. Haney; William J. Humphrey; John Reynders; Stephen Smith; Timothy J. Williams
POOMA is a templated C++ class library for use in the development of large-scale scientific simulations on serial and parallel computers. POOMA II is a new design and implementation of POOMA intended to add richer capabilities and greater flexibility to the framework. The new design employs a generic Array class that acts as an interface to, or view on, a wide variety of data representation objects referred to as engines. This design separates the interface and the representation of multidimensional arrays. The separation is achieved using compile-time techniques rather than virtual functions, and thus code efficiency is maintained. POOMA II uses PETE, the Portable Expression Template Engine, to efficiently represent complex mathematical expressions involving arrays and other objects. The representation of expressions is kept separate from expression evaluation, allowing the use of multiple evaluator mechanisms that can support nested where-block constructs, hardware-specific optimizations and different run-time environments.
international conference on supercomputing | 1999
Suvas Vajracharya; Steve Karmesin; Peter H. Beckman; James A. Crotinger; Allen D. Malony; Sameer Shende; R. R. Oldehoeft; Stephen Smith
In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of todays multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.
Selected Papers from the International Seminar on Generic Programming | 1998
James A. Crotinger; Julian C. Cummings; Scott W. Haney; William Humphrey; Steve Karmesin; John Reynders; Stephen Smith; Timothy J. Williams
POOMA is a C++ framework for developing portable scientific applications for serial and parallel computers using high-level physical abstractions. PETE is an general-purpose expression-template library employed by POOMA to implement expression evaluation. This paper discusses generic programming techniques that are used to achieve flexibility and high performance in both POOMA and PETE. POOMAs array class factors the data representation and look-up into a generic engine concept. PETEs expression templates are used to build and operate efficiently on expressions. PETE is implemented using generic techniques that allow it to adapt to a variety of client-class interfaces, and to provide a powerful and flexible compile-time expression-tree-traversal mechanism.
conference on scientific computing | 1997
William Humphrey; Steve Karmesin; Federico Bassetti; John Reynders
The POOMA framework is a C++ class library for the development of large-scale parallel scientific applications. POOMAs Field class implements a templated, multidimensional, data-parallel array that partitions data in a simulation domain into sub-blocks. These subdomain blocks are used on a parallel computer in data-parallel Field expressions. In this paper we describe the design of Fields, their implementation in the POOMA framework, and their performance on a Silicon Graphics Inc. Origin 2000. We focus on the aspects of the Field implementation which relate to efficient memory use and improvement of run-time performance: reducing the number of temporaries through expression templates, reducing the total memory used by compressing constant regions, and performing calculations on sparsely populated Fields by using sparse index lists.
european conference on object-oriented programming | 1998
J. C. Marshall; L. A. Ankeny; S. P. Clancy; J. H. Hall; J. H. Heiken; K. S. Holian; Stephen R. Lee; G. R. McNamara; J. W. Painter; M. E. Zander; Julian C. Cummings; Scott W. Haney; Steve Karmesin; William Humphrey; John Reynders; T. W. Williams; R. L. Graham
The authors describe a C++ physics development environment, called the Tecolote Framework, which allows model developers to work more efficiently and accurately. This Framework contains a variety of meshes, operators, and parallel fields, as well as an input/output (I/O) subsystem and graphics capabilities. Model developers can inherit Tecolote`s generic model interface and use the Framework`s high-level field and operator components to write parallel physics equations. New Tecolote models are easily registered with the Framework, and they can be built and called directly from the input file, which greatly expedites model installation. In the process of developing an extensible and robust framework, they have found appealing solutions to some of the serious problems they encounter when parallelizing and extending the older codes. They also discuss memory and performance issues for a large hydrodynamics application built in this Framework.
Archive | 1998
Julian C. Cummings; James A. Crotinger; Scott W. Haney; William Humphrey; Steve Karmesin; John Reynders; Al Stephen; Timothy Joe Williams
Archive | 1999
Scott W. Haney; James A. Crotinger; Steve Karmesin; Stephen Smith. Pete
Archive | 2002
Steve Karmesin; Scott W. Haney; B. Humphrey; Jonathon N. Cummings; Tiffani L. Williams; James A. Crotinger; Stephen J. Smith; Eugene M. Gavrilov
parallel and distributed processing techniques and applications | 1999
Suvas Vajracharya; Peter H. Beckman; Steve Karmesin; Katarzyna Keahey; R. R. Oldehoeft; Craig Edward Rasmussen