Bruce R. Childers | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bruce R. Childers is active.

Explore More

Publication

Featured researches published by Bruce R. Childers.

IEEE Transactions on Parallel and Distributed Systems | 2003

Scheduling with dynamic voltage/speed adjustment using slack reclamation in multiprocessor real-time systems

Dakai Zhu; Rami G. Melhem; Bruce R. Childers

The high power consumption of modern processors becomes a major concern because it leads to decreased mission duration (for battery-operated systems), increased heat dissipation, and decreased reliability. While many techniques have been proposed to reduce power consumption for uniprocessor systems, there has been considerably less work on multiprocessor systems. In this paper, based on the concept of slack sharing among processors, we propose two novel power-aware scheduling algorithms for task sets with and without precedence constraints executing on multiprocessor systems. These scheduling techniques reclaim the time unused by a task to reduce the execution speed of future tasks and, thus, reduce the total energy consumption of the system. We also study the effect of discrete voltage/speed levels on the energy savings for multiprocessor systems and propose a new scheme of slack reservation to incorporate voltage/speed adjustment overhead in the scheduling algorithms. Simulation and trace-based results indicate that our algorithms achieve substantial energy savings on systems with variable voltage processors. Moreover, processors with a few discrete voltage/speed levels obtain nearly the same energy savings as processors with continuous voltage/speed, and the effect of voltage/speed adjustment overhead on the energy savings is relatively small.

symposium on code generation and optimization | 2003

Retargetable and reconfigurable software dynamic translation

Kevin Scott; Naveen Kumar; S. Velusamy; Bruce R. Childers; Jack W. Davidson; Mary Lou Soffa

Software dynamic translation (SDT) is a technology that permits the modification of an executing programs instructions. In recent years, SDT has received increased attention, from both industry and academia, as a feasible and effective approach to solving a variety of significant problems. Despite this increased attention, the task of initiating a new project in software dynamic translation remains a difficult one. To address this concern, and in particular, to promote the adoption of SDT technology into an even wider range of applications, we have implemented Strata, a cross-platform infrastructure for building software dynamic translators. This paper describes Stratas architecture, our experience retargeting it to three different processors, and our use of Strata to build two novel SDT systems - one for safe execution of untrusted binaries and one for fast prototyping of architectural simulators.

design, automation, and test in europe | 2010

Increasing PCM main memory lifetime

Alexandre Peixoto Ferreira; Miao Zhou; Santiago Bock; Bruce R. Childers; Rami G. Melhem; Daniel Mossé

The introduction of Phase-Change Memory (PCM) as a main memory technology has great potential to achieve a large energy reduction. PCM has desirable energy and scalability properties, but its use for main memory also poses challenges such as limited write endurance with at most 107 writes per bit cell before failure. This paper describes techniques to enhance the lifetime of PCM when used for main memory. Our techniques are (a) writeback minimization with new cache replacement policies, (b) avoidance of unnecessary writes, which write only the bit cells that are actually changed, and (c) endurance management with a novel PCM-aware swap algorithm for wear-leveling. A failure detection algorithm is also incorporated to improve the reliability of PCM. With these approaches, the lifetime of a PCM main memory is increased from just a few days to over 8 years.

high performance computer architecture | 2012

Improving write operations in MLC phase change memory

Lei Jiang; Bo Zhao; Youtao Zhang; Jun Yang; Bruce R. Childers

Phase change memory (PCM) recently has emerged as a promising technology to meet the fast growing demand for large capacity memory in modern computer systems. In particular, multi-level cell (MLC) PCM that stores multiple bits in a single cell, offers high density with low per-byte fabrication cost. However, despite many advantages, such as good scalability and low leakage, PCM suffers from exceptionally slow write operations, which makes it challenging to be integrated in the memory hierarchy. In this paper, we propose architectural innovations to improve the access time of MLC PCM. Due to cell process variation, composition fluctuation and the relatively small differences among resistance levels, MLC PCM typically employs an iterative write scheme to achieve precise control, which suffers from large write access latency. To address this issue, we propose write truncation (WT) to reduce the number of write iterations with the assistance of an extra error correction code (ECC). We also propose form switch (FS) to reduce the storage overhead of the ECC. By storing highly compressible lines in SLC form, FS improves read latency as well. Our experimental results show that WT and FS improve the effective write/read latency by 57%/28% respectively, and achieve 26% performance improvement over the state of the art.

real-time systems symposium | 2001

Scheduling with dynamic voltage/speed adjustment using slack reclamation in multi-processor real-time systems

Dakai Zhu; Rami G. Melhem; Bruce R. Childers

The power consumption of modern high-performance processors is becoming a major concern because it leads to increased heat dissipation and decreased reliability. While many techniques have been proposed to reduce power consumption for uni-processors, there has been considerably less work on multi-processor systems. In this paper we focus on power-aware scheduling for multi-processor real-time systems. Based on the idea of slack sharing among processors, we propose two novel scheduling algorithms for task sets with and without precedence constraints. These scheduling techniques reclaim the time unused by a task to reduce the execution speed of future tasks, and thus reduce the total energy consumption of the system. Simulation results indicate that our algorithms achieve up to 60% energy savings on multi-processor systems with variable voltage processors.

international conference on software engineering | 2005

Demand-driven structural testing with dynamic instrumentation

Jonathan Misurda; James A. Clause; Juliya L. Reed; Bruce R. Childers; Mary Lou Soffa

Producing reliable and robust software has become one of the most important software development concerns. Testing is a process by which software quality can be assured through the collection of information. While testing can improve software reliability, current tools typically are inflexible and have high overheads, making it challenging to test large software projects. In this paper, we describe a new scalable and flexible framework for testing programs with a novel demand-driven approach based on execution paths to implement test coverage. This technique uses dynamic instrumentation on the binary code that can be inserted and removed on-the-fly to keep performance and memory overheads low. We describe and evaluate implementations of the framework for branch, node and def-use testing of Java programs. Experimental results for branch testing show that our approach has, on average, a 1.6 speed up over static instrumentation and also uses less memory.

high-performance computer architecture | 2011

CloudCache: Expanding and shrinking private caches

Hyunjin Lee; Sangyeun Cho; Bruce R. Childers

The number of cores in a single chip multiprocessor is expected to grow in coming years. Likewise, aggregate on-chip cache capacity is increasing fast and its effective utilization is becoming ever more important. Furthermore, available cores are expected to be underutilized due to the power wall and highly heterogeneous future workloads. This trend makes existing L2 cache management techniques less effective for two problems: increased capacity interference between working cores and longer L2 access latency. We propose a novel scalable cache management framework called CloudCache that creates dynamically expanding and shrinking L2 caches for working threads with fine-grained hardware monitoring and control. The key architectural components of CloudCache are L2 cache chaining, inter- and intra-bank cache partitioning, and a performance-optimized coherence protocol. Our extensive experimental evaluation demonstrates that CloudCache significantly improves performance of a wide range of workloads when all or a subset of cores are occupied.

symposium on code generation and optimization | 2007

Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems

Jason D. Hiser; Daniel W. Williams; Wei Hu; Jack W. Davidson; Jason Mars; Bruce R. Childers

Software Dynamic Translation (SDT) systems are used for program instrumentation, dynamic optimization, security, intrusion detection, and many other uses. As noted by many researchers, a major source of SDT overhead is the execution of code which is needed to translate an indirect branchs target address into the address of the translated destination block. This paper discusses the sources of indirect branch (IB) overhead in SDT systems and evaluates several techniques for overhead reduction. Measurements using SPEC CPU2000 show that the appropriate choice and configuration of IB translation mechanisms can significantly reduce the IB handling overhead. In addition, cross-architecture evaluation of IB handling mechanisms reveals that the most efficient implementation and configuration can be highly dependent on the implementation of the underlying architecture.

ieee computer society annual symposium on vlsi | 2007

Performance of Graceful Degradation for Cache Faults

Hyunjin Lee; Sangyeun Cho; Bruce R. Childers

In sub-90nm technologies, more frequent hard faults pose a serious burden on processor design and yield control. In addition to manufacturing-time chip repair schemes, microarchitectural techniques to make processor components resilient to hard faults become increasingly important. This paper considers defects in cache memory and studies their impact on program performance using a fault degradable cache model. We first describe how defects at the circuit level in cache manifest themselves at the microarchitecture level. We then examine several strategies for masking faults, by disabling faulty resources, such as lines, sets, ways, ports, or even the whole cache. We also propose an efficient cache set remapping scheme to recover lost performance due to failed sets. Using a new simulation tool, called CAFE, we study how the cache faults impact program performance under the various masking schemes

languages, compilers, and tools for embedded systems | 2003

Predicting the impact of optimizations for embedded systems

Min Zhao; Bruce R. Childers; Mary Lou Soffa

When applying optimizations, a number of decisions are made using fixed strategies, such as always applying an optimization if it is applicable, applying optimizations in a fixed order and assuming a fixed configuration for optimizations such as tile size and loop unrolling factor. While it is widely recognized that these fixed strategies may not be the most appropriate for producing high quality code, especially for embedded systems, there are no general and automatic strategies that do otherwise. In this paper, we present a framework that enables these decisions to be made based on predicting the impact of an optimization, taking into account resources and code context. The framework consists of optimization models, code models and resource models, which are integrated for predicting the impact of applying optimizations. Because data cache performance is important to embedded codes, we focus on cache performance and present an instance of the framework for cache performance in this paper. Since most opportunities for cache improvement come from loop optimizations, we describe code, optimization and cache models tailored to predict the impact of applying loop optimizations for data locality. Experimentally we demonstrate the need to selectively apply optimizations and show the performance benefit of our framework in predicting when to apply an optimization. We also show that our framework can be used to choose the most beneficial optimization when a number of optimizations can be applied to a loop nest. And lastly, we show that we can use the framework to combine optimizations on a loop nest.

Explore More