Bill Moyer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bill Moyer is active.

Explore More

Publication

Featured researches published by Bill Moyer.

international symposium on low power electronics and design | 2000

A low power unified cache architecture providing power and performance flexibility (poster session)

Afzal Malik; Bill Moyer; Dan Cermak

Advances in technology have allowed portable electronic devices to become smaller and more complex, placing stringent power and performance requirements on the devices components. The M.CORE M3 architecture was developed specifically for these embedded applications. To address the growing need for longer battery life and higher performance, an 8-Kbyte, 4-way set-associative, unified (instruction and data) cache with programmable features was added to the M3 core. These features allow the architecture to be optimized based on the applications requirements. In this paper we focus on the features of the M340 cache sub-system and illustrate the effect on power and performance through benchmark analysis and actual silicon measurements.

international symposium on low power electronics and design | 1999

Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

Lea Hwang Lee; Bill Moyer; John Arends

A fair amount of work has been done in recent years on reducing power consumption in caches by using a small instruction buffer placed between the execution pipe and a larger main cache. These techniques, however, often degrade the overall system performance. In this paper, we propose using a small instruction buffer, also called a loop cache, to save power. A loop cache has no address tag store. It consists of a direct-mapped data array and a loop cache controller. The loop cache controller knows precisely whether the next instruction request will hit in the loop cache, well ahead of time. As a result, there is no performance degradation.

Proceedings of the IEEE | 2001

Low-power design for embedded processors

Bill Moyer

Minimization of power consumption in portable and battery powered embedded systems has become an important aspect of processor and system design. Opportunities for power optimization and tradeoffs emphasizing low power are available across the entire design hierarchy. A review of low-power techniques applied at many levels of the design hierarchy is presented, and an example of low-power processor architecture is described along with some of the design decisions made in implementation of the architecture.

international symposium on microarchitecture | 1984

The Motorola MC68020

Douglas B. MacGregor; David S. Mothersole; Bill Moyer

This new 32-bit microprocessor provides high performance, instruction set extensibility, and compatibility with existing M68000 family software.

international symposium on microarchitecture | 1999

Low-cost branch folding for embedded applications with small tight loops

Lea Hwang Lee; Jeff Scott; Bill Moyer; John Arends

Many portable and embedded applications are characterized by spending a large fraction of execution time on small program loops. To improve performance many embedded systems use special instructions to handle program loop executions. These special instructions, however, consume opcode space, which is valuable in the embedded computing environments. In this paper, we propose a hardware technique for folding our branches when executing these small loops. This technique does not require any special branch instructions. It is based on the detection and utilization of certain short backward branch instructions (sbb). A sbb is any PC-relative branch instruction with a limited backward branch distance. Once an sbb is detected, its displacement field is used by the hardware to identify the actual program loop size. It does so by loading this negative displacement field into a counter and incrementing the counter for each instruction sequentially executed. As the count approaches zero, the hardware folds out the sbb by predicting that it is always taken. The hardware overhead for this technique is minimal. Using a 5-bit increment counter, the performance improvement over a set of embedded applications is about 7.5%.

compilers, architecture, and synthesis for embedded systems | 2000

A programmable unified cache architecture for embedded applications

Afzal Malik; Bill Moyer; Dan Cermak

Advances in technology have allowed portable electronic devices to become smaller and more complex, placing stringent power and performance requirements on the device’s components. The M•CORE M3 architecture was developed specifically for these embedded applications. To address the growing need for longer battery life and higher performance, an 8Kbyte, 4-way setassociative, unified (instruction and data) cache with programmable features was added to the M3 core. These features include write mode selection, way management, and buffer enabling/disabling which allow the architecture to be optimized based on the application’s requirements. In this paper, we present the features of the unified cache architecture and illustrate the effect on power and performance through benchmark analysis and actual silicon measurements. A Programmable Unified Cache Architecture for Embedded Applications

international conference on computer design | 2000

The M/spl middot/CORE/sup TM/ M340 unified cache architecture

Afzal Malik; Bill Moyer; Dan Cermak

The MCORE M340 architecture was designed to target the low-power, embedded application market. Building upon the MCORE M3 core, the M340 provides enhancements through the addition of an 8 K, 4-way set-associative unified (instruction/data) cache and an on-chip Memory Management Unit (MMU) that contains a single unified 64-entry TLB capable of mapping multiple page sizes. To achieve the power and performance requirements that todays portable electronics demand the M340 provides programmable features that allow the architecture to be optimized for a given application. This paper discusses the features of the M340 cache sub-system and illustrates the power and performance improvements that can be achieved through proper configuration.

compilers, architecture, and synthesis for embedded systems | 2002

Embedded cache architecture with programmable write buffer support for power and performance flexibility

Afzal Malik; Bill Moyer; Roger Zhou

Next generation portable devices are placing stringent requirements on overall system power and performance. Voice recognition, streaming video and high speed wire¿less internet access are just some of the features being incorporated in these handheld electronic gadgets. The M¿CORE M341-S processor has been designed for high performance and cost sensitive portable products as well as for high end embedded control applications. M341-S obtains increased performance over the M¿CORE M2 and M310 families by integrating unified 16KB cache, and additional instruction pipelining and buffering to increase the operating frequency. An 8-entry programmable write buffer which can defer pending write misses and writethrough accesses is used in order to maximize perfor¿mance. In this paper, we discuss the enhanced cache archi¿tecture and the flexible priority scheme for controlling the write buffer. We use a hardware technique which provides a flexible mechanism to control emptying and flushing of write buffer based on a set of configurable thresholds, as well as a mechanism to alter the priorities from the write buffer to the main memory system. The same unified mechanism is used to support flushing as well as providing a solution for the read after write (RAW) hazard avoid¿ance. We present the enhancements made to the M3 core and discuss the effect on power and performance through benchmark analysis and actual silicon measurements.

Archive | 1999