John Arends | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Arends is active.

Explore More

Publication

Featured researches published by John Arends.

international symposium on low power electronics and design | 1999

Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

Lea Hwang Lee; Bill Moyer; John Arends

A fair amount of work has been done in recent years on reducing power consumption in caches by using a small instruction buffer placed between the execution pipe and a larger main cache. These techniques, however, often degrade the overall system performance. In this paper, we propose using a small instruction buffer, also called a loop cache, to save power. A loop cache has no address tag store. It consists of a direct-mapped data array and a loop cache controller. The loop cache controller knows precisely whether the next instruction request will hit in the loop cache, well ahead of time. As a result, there is no performance degradation.

international symposium on microarchitecture | 1999

Low-cost branch folding for embedded applications with small tight loops

Lea Hwang Lee; Jeff Scott; Bill Moyer; John Arends

Many portable and embedded applications are characterized by spending a large fraction of execution time on small program loops. To improve performance many embedded systems use special instructions to handle program loop executions. These special instructions, however, consume opcode space, which is valuable in the embedded computing environments. In this paper, we propose a hardware technique for folding our branches when executing these small loops. This technique does not require any special branch instructions. It is based on the detection and utilization of certain short backward branch instructions (sbb). A sbb is any PC-relative branch instruction with a limited backward branch distance. Once an sbb is detected, its displacement field is used by the hardware to identify the actual program loop size. It does so by loading this negative displacement field into a counter and incrementing the counter for each instruction sequentially executed. As the count approaches zero, the hardware folds out the sbb by predicting that it is always taken. The hardware overhead for this technique is minimal. Using a 5-bit increment counter, the performance improvement over a set of embedded applications is about 7.5%.

Archive | 2006