Ulrich Kremer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ulrich Kremer is active.

Explore More

Publication

Featured researches published by Ulrich Kremer.

programming language design and implementation | 2003

The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction

Chung-Hsing Hsu; Ulrich Kremer

This paper presents the design and implementation of a compiler algorithm that effectively optimizes programs for energy usage using dynamic voltage scaling (DVS). The algorithm identifies program regions where the CPU can be slowed down with negligible performance loss. It is implemented as a source-to-source level transformation using the SUIF2 compiler infrastructure. Physical measurements on a high-performance laptop show that total system (i.e., laptop) energy savings of up to 28% can be achieved with performance degradation of less than 5% for the SPECfp95 benchmarks. On average, the system energy and energy-delay product are reduced by 11% and 9%, respectively, with a performance slowdown of 2%. It was also discovered that the energy usage of the programs using our DVS algorithm is within 6% from the theoretical lower bound. To the best of our knowledge, this is one of the first work that evaluates DVS algorithms by physical measurements.

acm sigplan symposium on principles and practice of parallel programming | 1991

A static performance estimator to guide data partitioning decisions

Vasanth Balasundaram; Geoffrey C. Fox; Ken Kennedy; Ulrich Kremer

The choice of the data domain partitioning scheme is an important factor in determining the available parallelism and hence the performance of an application on a distributed memory multiprocessor. In this paper, we present a performance estimator for statically evaluating the relative efficiency of different data partitioning schemes for any given program on any given distributed memory multiprocessor. Our methlod is not based on a theoretical machine model, but ixnstead uses a set of kernel routinea to “train” the estimator for each target machine. We also describe a prototype implementation of this technique and discuss an experimental evaluation of its accuracy.

ACM Transactions on Programming Languages and Systems | 1998

Automatic data layout for distributed-memory machines

Ken Kennedy; Ulrich Kremer

The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. After the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. If languages such as HPF are to find general acceptance, the need for data layout selection support has to be addressed. We beleive that the appropriate way to provide the needed support is through a tool that generates data layout specifications automatically. This article discusses the design and implementation of a data layout selection tool that generates HPF-style data layout specifications automatically. Because layout is done in a tool that is not embedded in the target compiler and hence will be run only a few times during the tuning phase of an application, it can use techniques such as integer programming that may be considered too computationally expensive for inclusion in production compilers. The proposed framework for automatic data layout selection builds and examines search spaces of candidate data layouts. A candidate layout is an efficient layout for some part of the program. After the generation of search spaces, a single candidate layout is selected for each program part, resulting in a data layout for the entire program. A good overall data layout may require the remapping of arrays between program parts. A performance estimator based on a compiler model, an execution model, and a machine model are needed to predict the execution time of each candidate layout and the costs of possible remappings between candidate data layouts. In the proposed framework, instances of NP-complete problems are solved during the construction of candidate layout search spaces and the final selection of candidate layouts from each search space. Rather than resorting to heuristics, the framework capitalizes on state-of-the-art 0-1 integer programming technology to compute optimal solutions of these NP-complete problems. A prototype data layout assistant tool based on our framework has been implemented as part of the D system currently under development at Rice University. The article reports preliminary experimental results. The results indicate that the framework is efficient and allows the generation of data layouts of high quality.

languages and compilers for parallel computing | 2001

A compilation framework for power and energy management on mobile computers

Ulrich Kremer; Jamey Hicks; James M. Rehg

Power and energy management is crucial for mobile devices that rely on battery power. In addition to voice recognition, image understanding is an important class of applications for mobile environments. We propose a new compilation strategy for remote task mapping, and report experimental results for a face detection and recognition system. Our compilation strategy generates two versions of the input program, one to be executed on the mobile device (client), and the other on a machine connected to the mobile device via a wireless network (server). Compiler supported checkpointing is used to allow the client to monitor program progress on the server, and to request checkpoint data in case of anticipated server and/or network failure. The reported results have been obtained by actual power measurements, not simulation. Experiments show energy savings of up to one order of magnitude on the mobile machine. A prototype implementation of the discussed compilation framework is underway, and preliminary results are reported.

languages compilers and tools for embedded systems | 2002

Energy-conscious compilation based on voltage scaling

Hendra Saputra; Mahmut T. Kandemir; Narayanan Vijaykrishnan; Mary Jane Irwin; Jie S. Hu; Chung-Hsing Hsu; Ulrich Kremer

As energy consumption has become a majorconstraint in current system design, it is essential to look beyond the traditional low-power circuit and architectural optimizations. Further, software is becoming an increasing portion of embedded/portable systems. Consequently, optimizing the software in conjunction with the underlying low-power hardware features such as voltage scaling is vital.In this paper, we present two compiler-directed energy optimization strategies based on voltage scaling: static voltage scaling and dynamic voltage scaling. In static voltage scaling, the compiler determines a single supply voltage level for the entire input program. We primarily aim at improving the energy consumption of a given code without increasing its execution time. To accomplish this, we employ classical loop-level compiler optimizations. However, we use these optimizations to create opportunities for voltage scaling to save energy, rather than increase program performance.In dynamic voltage scaling, the compiler can select different supply voltage levels for different parts of the code. Our compilation strategy is based on integer linear programming and can accommodate energy/performance constraints. For a benchmark suite of array-based scientific codes and embedded video/image applications, our experiments show average energy savings of 31.8% when static voltage scaling is used. Our dynamic voltage scaling strategy saves 15.3% more energy than static voltage scaling when invoked under the same performance constraints.

languages and compilers for parallel computing | 1991

An Overview of the Fortran D Programming System

Seema Hiranandani; Ken Kennedy; Charles Koelbel; Ulrich Kremer; Chau-Wen Tseng

The success of large-scale parallel architectures is limited by the difficulty of developing machine-independent parallel programs. We have developed Fortran D, a version of Fortran extended with data decomposition specifications, to provide a portable data-parallel programming model. This paper presents the design of two key components of the Fortran D programming system: a prototype compiler and an environment to assist automatic data decomposition. The Fortran D compiler addresses program partitioning, communication generation and optimization, data decomposition analysis, run-time support for unstructured computations, and storage management. The Fortran D programming environment provides a static performance estimator and an automatic data partitioner. We believe that the Fortran D programming system will significantly ease the task of writing machine-independent data-parallel programs.

international conference on distributed computing systems | 2004

Spatial programming using smart messages: design and implementation

Cristian Borcea; Chalermek Intanagonwiwat; Porlin Kang; Ulrich Kremer; Liviu Iftode

Spatial programming (SP) is a space-aware programming model for outdoor distributed embedded systems. Central to SP are the concepts of space and spatial reference, which provide applications with a virtual resource naming in networks of embedded systems. A network resource is referenced using its expected physical location and properties. Together with other SP features, such as reference consistency and access timeout, they help programmers cope with highly dynamic network configurations in a network-transparent fashion. We present the SP design and its implementation using smart messages, a lightweight software architecture similar to mobile agents, that we developed for networks of embedded systems. We also describe the implementation and evaluation of a simple SP application over a testbed consisting of HP iPAQs running Linux and equipped with 802.11 cards for wireless communication. The experimental results indicate that SP is a viable programming model for outdoor distributed computing.

international conference on parallel architectures and compilation techniques | 2002

Application transformations for energy and performance-aware device management

Taliver Heath; Eduardo Pinheiro; Jerry Hom; Ulrich Kremer; Ricardo Bianchini

Energy conservation without performance degradation is an important goal for battery-operated computers, such as laptops and handheld assistants. In this paper we determine the potential benefits of application-supported device management for optimizing energy and performance. In particular, we consider application transformations that increase device idle times and inform the operating system about the length of each upcoming period of idleness. We assess the potential energy and performance benefits of this type of application support for a laptop disk. Furthermore, we propose and evaluate a compiler framework for performing the transformations automatically for a disk device. Our experimental results demonstrate that unless applications are transformed, they cannot accrue any of the predicted benefits. In addition, they show that our compiler can produce almost the same performance and energy results that we obtain by hand-modifying applications. Overall, we find that the transformations we propose can reduce disk energy consumption from 55% to 89% with only a small degradation in performance.

distributed memory computing conference | 1990

An Interactive Environment for Data Partitioning and Distribution

Vasanth Balasundaram; Geoffrey C. Fox; Ken Kennedy; Ulrich Kremer

An a.pproach to distributed riieiiiory pa.ralle1 programining that has recently become popular is oue where the programmer explicitly specilies t.he data decoiriposit.ion using language extensions, and a. compiler geuerates all the coiriinunicatioii. While this frees the prograiniuer froin tlie tedium of thinking about message-passing, no assistance is provided in determining the data decouiposition scheme that gives the best performance on tlie target machine. In this paper, we propose an interactive software tool that provides assistance for this very task. The proposed tool also computes performance estimates for any chosen data partitioning scheme, allowing tlie programmer to experiment with several different stra.tegies without ever running the program on the rnacliine.

The Computer Journal | 2004

Smart Messages: A Distributed Computing Platform for Networks of Embedded Systems

Porlin Kang; Cristian Borcea; Gang Xu; Akhilesh Saxena; Ulrich Kremer; Liviu Iftode

In this paper, we present the design and implementation of Smart Messages, a distributed computing platform for networks of embedded systems based on execution migration. A Smart Message (SM) is a user-defined distributed program which executes on nodes of interest, named by their properties, and uses an explicit lightweight migration to reach these nodes. During migrations, an SM carries its code and execution state, and it self-routes at each intermediate node between two nodes of interest. The nodes in the network cooperate to support the SM execution by providing a virtual machine and a sharedmemory region addressable by names (tag space). To illustrate the flexibility of SMs to program real world applications, we describe EZCab, an application for booking cabs in densely populated urban areas.We also present experimental results to quantify the performance achieved by the SM prototype.

Explore More