Henry G. Dietz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Henry G. Dietz is active.

Explore More

Publication

Featured researches published by Henry G. Dietz.

Signal Processing | 2007

Performance of phase transform for detecting sound sources with microphone arrays in reverberant and noisy environments

Kevin D. Donohue; Jens Hannemann; Henry G. Dietz

The performance of sound source location (SSL) algorithms with microphone arrays can be enhanced by processing signals prior to the delay and sum operation. The phase transform (PHAT) has been shown to improve SSL images, especially in reverberant environments. This paper introduces a modification, referred to as the PHAT-@b transform, that varies the degree of spectral magnitude information used by the transform through a single parameter. Performance results are computed using a Monte Carlo simulation of an eight element perimeter array with a receiver operating characteristic (ROC) analysis for detecting single and multiple sound sources. In addition, a Fishers criterion performance measure is also computed for target and noise peak separability and compared to the ROC results. Results show that the standard PHAT significantly improves detection performance for broadband signals especially in high levels of reverberation noise, and to a lesser degree for noise from other coherent sources. For narrowband targets the PHAT typically results in significant performance degradation; however, the PHAT-@b can achieve performance improvements for both narrowband and broadband signals. Finally, the performance for real speech signal samples is examined and shown to exhibit properties similar to both the simulated broad and narrowband cases, suggesting the use of @b values between 0.5 and 0.7 for array applications with general signals.

languages and compilers for parallel computing | 2000

Compiler Techniques for Flat Neighborhood Networks

Henry G. Dietz; Timothy Mattox

A Flat Neighborhood Network (FNN) is a new interconnection network architecture that can provide very low latency and high bisection bandwidth at a minimal cost for large clusters. However, unlike more traditional designs, FNNs generally are not symmetric. Thus, although an FNN by definition offers a certain base level of performance for random communication patterns, both the network design and communication (routing) schedules can be optimized to make specific communication patterns achieve significantly more than the basic performance. The primary mechanism for design of both the network and communication schedules is a set of genetic search algorithms (GAs) that derive good designs from specifications of particular communication patterns. This paper centers on the use of these GAs to compile the network wiring pattern, basic routing tables, and code for specific communication patterns that will use an optimized schedule rather than simply applying the basic routing.

conference on high performance computing (supercomputing) | 2000

High-Cost CFD on a Low-Cost Cluster

Thomas Hauser; Timothy Mattox; Raymond P. LeBeau; Henry G. Dietz; P. George Huang

Direct numerical simulation of the Navier-Stokes equations (DNS) is an important technique for the future of computational fluid dynamics (CFD) in engineering applications. However, DNS requires massive computing resources. This paper presents a new approach for implementing high-cost DNS CFD using low-cost cluster hardware. After describing the DNS CFD code DNSTool, the paper focuses on the techniques and tools that we have developed to customize the performance of a cluster implementation of this application. This tuning of system performance involves both recoding of the application and careful engineering of the cluster design. Using the cluster KLAT2 (Kentucky Linux Athlon Testbed 2), while DNSTool cannot match the

international parallel and distributed processing symposium | 2006

Compiler and runtime support for predictive control of power and cooling

Henry G. Dietz; William R. Dieter

0.64 per MFLOPS that KLAT2 achieves on single precision ScaLAPACK, it is very efficient; DNSTool on KLAT2 achieves price/performance of

IEEE Computer Architecture Letters | 2007

Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy

William R. Dieter; Akil Kaveti; Henry G. Dietz

2.75 per MFLOPS double precision and

languages and compilers for parallel computing | 2002

Compiler optimizations using data compression to decrease address reference entropy

Henry G. Dietz; Timothy Mattox

1.86 single precision. Further, the code and tools are all, or will soon be, made freely available as full source code.

languages and compilers for parallel computing | 2009

MIMD interpretation on a GPU

Henry G. Dietz; B. Dalton Young

The low cost of clusters built using commodity components has made it possible for many more users to purchase their own supercomputer. However, even modest-sized clusters make significant demands on the power and cooling infrastructure. Minimizing impact of problems after they are detected is not as effective as avoiding problems altogether. This paper is about achieving the best system performance by predicting and avoiding power and cooling problems. Although measuring power and thermal properties of a code is not trivial, the primary issue is making predictions sufficiently in advance so that they can be used to drive predictive, rather than just reactive control at runtime. This paper presents new compiler analysis supporting interprocedural power prediction and a variety of other compiler and runtime technologies making feed-forward control feasible. The techniques apply to most computer systems, but some properties specific to clusters and parallel supercomputing are used where appropriate.

languages and compilers for parallel computing | 2003

Much Ado about Almost Nothing: Compilation for Nanocontrollers

Henry G. Dietz; Shashi Deepa Arcot; Sujana Gorantla

Some processors designed for consumer applications, such as graphics processing units (CPUs) and the CELL processor, promise outstanding floating-point performance for scientific applications at commodity prices. However, IEEE single precision is the most precise floating-point data type these processors directly support in hardware. Pairs of native floating-point numbers can be used to represent a base result and a residual term to increase accuracy, but the resulting order of magnitude slowdown dramatically reduces the price/performance advantage of these systems. By adding a few simple microarchitectural features, acceptable accuracy can be obtained with relatively little performance penalty. To reduce the cost of native-pair arithmetic, a residual register is used to hold information that would normally have been discarded after each floating-point computation. The residual register dramatically simplifies the code, providing both lower latency and better instruction-level parallelism.

international parallel and distributed processing symposium | 2005

Sparse flat neighborhood networks (SFNNs): scalable guaranteed pairwise bandwidth & unit latency

Timothy Mattox; Henry G. Dietz; William R. Dieter

In modern computers, a single “random” access to main memory often takes as much time as executing hundreds of instructions. Rather than using traditional compiler approaches to enhance locality by interchanging loops, reordering data structures, etc., this paper proposes the radical concept of using aggressive data compression technology to improve hierarchical memory performance by reducing memory address reference entropy. In some cases, conventional compression technology can be adapted. However, where variable access patterns must be permitted, other compression techniques must be used. For the special case of random access to elements of sparse matrices, data structures and compiler technology already exist. Our approach is much more general, using compressive hash functions to implement random access lookup tables. Techniques that can be used to improve the effectiveness of any compression method in reducing memory access entropy also are discussed.

languages and compilers for parallel computing | 2005

Manipulating MAXLIVE for spill-free register allocation

Shashi Deepa Arcot; Henry G. Dietz; Sarojini Priyadarshini Rajachidambaram

Programming heterogeneous parallel computer systems is notoriously difficult, but MIMD models have proven to be portable across multi-core processors, clusters, and massively parallel systems. It would be highly desirable for GPUs (Graphics Processing Units) also to be able to leverage algorithms and programming tools designed for MIMD targets. Unfortunately, most GPU hardware implements a very restrictive multi-threaded SIMD-based execution model. This paper presents a compiler, assembler, and interpreter system that allows a GPU to implement a richly featured MIMD execution model that supports shared-memory communication, recursion, etc. Through a variety of careful design choices and optimizations, reasonable efficiency is obtained on NVIDIA CUDA GPUs. The discussion covers both the methods used and the motivation in terms of the relevant aspects of GPU architecture.

Explore More