Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chuanjun Zhang is active.

Publication


Featured researches published by Chuanjun Zhang.


international symposium on computer architecture | 2003

A highly configurable cache architecture for embedded systems

Chuanjun Zhang; Frank Vahid; Walid A. Najjar

Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is largely determined by the behavior of the application using that cache. Desktop systems have to accommodate a very wide range of applications and therefore the manufacturer usually sets the cache architecture as a compromise given current applications, technology and cost. Unlike desktop systems, embedded systems are designed to run a small range of well-defined applications. In this context, a cache architecture that is tuned for that narrow range of applications can have both increased performance as well as lower energy consumption. We introduce a novel cache architecture intended for embedded microprocessor platforms. The cache can be configured by software to be direct-mapped, two-way, or four-way set associative, using a technique we call way concatenation, having very little size or performance overhead. We show that the proposed cache architecture reduces energy caused by dynamic power compared to a way-shutdown cache. Furthermore, we extend the cache architecture to also support a way shutdown method designed to reduce the energy from static power that is increasing in importance in newer CMOS technologies. Our study of 23 programs drawn from Powerstone, MediaBench and Spec2000 show that tuning the caches configuration saves energy for every program compared to conventional four-way set-associative as well as direct mapped caches, with average savings of 40% compared to a four-way conventional cache.


ACM Transactions in Embedded Computing Systems | 2005

A highly configurable cache for low energy embedded systems

Chuanjun Zhang; Frank Vahid; Walid A. Najjar

Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is determined, to a large degree, by the behavior of the application executing on the architecture. Desktop systems have to accommodate a very wide range of applications and therefore the cache architecture is usually set by the manufacturer as a best compromise given current applications, technology, and cost. Unlike desktop systems, embedded systems are designed to run a small range of well-defined applications. In this context, a cache architecture that is tuned for that narrow range of applications can have both increased performance as well as lower energy consumption. We introduce a novel cache architecture intended for embedded microprocessor platforms. The cache has three software-configurable parameters that can be tuned to particular applications. First, the caches associativity can be configured to be direct-mapped, two-way, or four-way set-associative, using a novel technique we call way concatenation. Second, the caches total size can be configured by shutting down ways. Finally, the caches line size can be configured to have 16, 32, or 64 bytes. A study of 23 programs drawn from Powerstone, MediaBench, and Spec2000 benchmark suites shows that the configurable cache tuned to each program saved energy for every program compared to a conventional four-way set-associative cache as well as compared to a conventional direct-mapped cache, with an average savings of energy related to memory access of over 40%.


design, automation, and test in europe | 2004

A self-tuning cache architecture for embedded systems

Chuanjun Zhang; Frank Vahid; Roman L. Lysecky

Memory accesses can account for about half of a microprocessor systems power consumption. Customizing a microprocessor caches total size, line size and associativity to a particular program is well known to have tremendous benefits for performance and power. Customizing caches has until recently been restricted to core-based flows, in which a new chip will be fabricated. However, several configurable cache architectures have been proposed recently for use in pre-fabricated microprocessor platforms. Tuning those caches to a program is still however a cumbersome task left for designers, assisted in part by recent computer-aided design (CAD) tuning aids. We propose to move that CAD on-chip, which can greatly increase the acceptance of configurable caches. We introduce on-chip hardware implementing an efficient cache tuning heuristic that can automatically, transparently, and dynamically tune the cache to an executing program. We carefully designed the heuristic to avoid any cache flushing, since flushing is power and performance costly. By simulating numerous Powerstone and MediaBench benchmarks, we show that such a dynamic self-tuning cache can reduce memory-access energy by 45% to 55% on average, and as much as 97%, compared with a four-way set-associative base cache, completely transparently to the programmer.


rapid system prototyping | 2003

Cache configuration exploration on prototyping platforms

Chuanjun Zhang; Frank Vahid

We describe cache architecture, intended for prototype-oriented IC platforms, that automatically finds the best cache configuration for a particular application. The cache itself can be configured with respect to the total size, associativity, line size, and way prediction. The cache architecture includes an explorer component that efficiently searches the large space of possible configurations for the set of points representing meaningful tradeoffs between performance and energy - the Pareto-optimal set. We provide results of experiments showing that the architecture effectively finds a good set of Pareto points for numerous Powerstone and MediaBench embedded system benchmarks. Our architecture eliminates the need for time-consuming simulations to determine the best cache configuration, and imposes little power overhead and reasonable size overhead.


ieee computer society annual symposium on vlsi | 2003

Energy benefits of a configurable line size cache for embedded systems

Chuanjun Zhang; Frank Vahid; Walid A. Najjar

Previous work has shown that cache line sizes impact performance differently for different desktop programs; some programs work better with small line sizes, others with larger line sizes. Typical processors come with a line size that is a compromise, working best on the average for a variety of programs. We analyze the energy impact of different line sizes, for 19 embedded system benchmarks, and we show that tuning the line size to a particular program can reduce memory access energy by 50% in some examples. Our data argues strongly for the need for embedded microprocessors to have configurable line size caches, and for embedded system designers to put effort into choosing the best line size for their programs.


design, automation, and test in europe | 2004

Using a victim buffer in an application-specific memory hierarchy

Chuanjun Zhang; Frank Vahid

Customizing a memory hierarchy to a particular application or applications is becoming increasingly common in embedded system design, with one benefit being reduced energy. Adding a victim buffer to the memory hierarchy is known to reduce energy and improve performance on average, yet victim buffers are not typically found in commercial embedded processors. One problem with such buffers is, while they work well on average, they tend to hurt performance for many applications. We show that a victim buffer can be very effective if it is considered as a parameter in designing a memory hierarchy, like the traditional cache parameters of total size, associativity, and line size. We describe experiments on PowerStone and MediaBench benchmarks, showing that having the option of adding a victim buffer to a direct-mapped cache can reduce memory-access energy by a factor of 3 in some cases. Furthermore, even when other cache parameters are configurable, we show that a victim buffer can still reduce energy by 43%. By treating the victim buffer as a parameter, meaning the buffer can be included or excluded, we can avoid performance overhead of up to 4% on some examples. We discuss the victim buffer in the context of both core-based and pre-fabricated platform based design approaches.


design, automation, and test in europe | 2004

Low static-power frequent-value data caches

Chuanjun Zhang; Jun Yang; Frank Vahid

Static energy dissipation in cache memories will constitute an increasingly larger portion of total microprocessor energy dissipation due to nanoscale technology characteristics and the large size of on-chip caches. We propose to reduce the static energy dissipation of an on-chip data cache by taking advantage of the frequent values (FV) that widely exist in a data cache memory. The original FV-based low-power cache design aimed at only reducing dynamic power, at the cost of a 5% slowdown. We propose a better design that reduces both static and dynamic cache power, and that uses a circuit design that eliminates performance overhead. A designer can utilize our architecture by simulating an application and then synthesizing the FVs into an application-specific cache design when values will not change, or by simulating and then writing to an FV-cache with configuration registers when values could change. Furthermore, we describe hardware that can dynamically determine FVs and write to the configuration registers completely transparently. Experiments on 11 Spec 2000 benchmarks show that, in addition to the dynamic power savings, 33% static energy savings for data caches can be achieved.


Microelectronics Journal | 2003

Highly configurable platforms for embedded computing systems

Frank Vahid; Roman L. Lysecky; Chuanjun Zhang; Greg Stitt

Platform chips, which are pre-designed chips possessing numerous processors, memories, coprocessors, and field-programmable gates arrays, are becoming increasingly popular. Platforms eliminate the costs and risks associated with creating customized chips, but with the drawbacks of poorer performance and energy consumption. Making platforms highly configurable, so they can be tuned to the particular applications that will execute on those platforms, can help reduce those drawbacks. We discuss the trends leading embedded system designers towards the use of platforms instead of customized chips. We discuss UCR research in designing highly configurable platforms, highlighting some of our work in highly configurable caches, and in hardware/software partitioning.


international symposium on circuits and systems | 2002

A power-configurable bus for embedded systems

Chuanjun Zhang; Frank Vahid

Pre-designed configurable platforms, possessing microprocessors, memories, and numerous peripherals on a single chip, are increasing in popularity in embedded system design. Platform configurability enables use in more products, which results in lower cost platforms due to higher volume production. Common configurable features include voltage scaling and cache organization. We introduce a new bus architecture that can be configured for normal performance or low power. The additional hardware that provides this configurability imposes almost no performance overhead in normal mode compared to a non-configurable bus. We show the substantial range of power and performances that such a bus provides, and we show that the low-power configuration can result in energy savings, using a set of benchmarks.


Archive | 2004

Tuning Caches to Applications for Low-Energy Embedded Systems

Ann Gordon-Ross; Chuanjun Zhang; Frank Vahid; Nikil D. Dutt

The power consumed by the memory hierarchy of a microprocessor can contribute to as much as 50% of the total microprocessor system power, and is thus a good candidate for power and energy optimizations. We discuss four methods for tuning a microprocessors’ cache subsystem to the needs of any executing application for low-energy embedded systems. We introduce onchip hardware implementing an efficient cache tuning heuristic that can automatically, transparently, and dynamically tune a configurable level-one cache’s total size, associativity and line size to an executing application. We extend the single-level cache tuning heuristic for a two-level cache using a methodology applicable to both a simulation-based exploration environment and a hardware-based system prototyping environment. We show that a victim buffer can be very effective as a configurable parameter in a memory hierarchy. We reduce static energy dissipation of on-chip data cache by compressing the frequent values that widely exist in a data cache memory.

Collaboration


Dive into the Chuanjun Zhang's collaboration.

Top Co-Authors

Avatar

Frank Vahid

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jun Yang

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge