Dee A. B. Weikle | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dee A. B. Weikle is active.

Explore More

Publication

Featured researches published by Dee A. B. Weikle.

IEEE Transactions on Computers | 2000

Dynamic access ordering for streamed computations

Sally A. McKee; William A. Wulf; James H. Aylor; Robert H. Klenke; Maximo H. Salinas; Sung I. Hong; Dee A. B. Weikle

Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations such as scientific vector processing or multimedia (de)compression. Although these computations lack the temporal locality of reference that makes traditional caching schemes effective, they have predictable access patterns. Since most modern DRAM components support modes that make it possible to perform some access sequences faster than others, the predictability of the stream accesses makes it possible to reorder them to get better memory performance. We describe a Stream Memory Controller (SMC) system that combines compile-time detection of streams with execution-time selection of the access order and issue. The SMC effectively prefetches read-streams, buffers write-streams, and reorders the accesses to exploit the existing memory bandwidth as much as possible. Unlike most other hardware prefetching or stream buffer designs, this system does not increase bandwidth requirements. The SMC is practical to implement, using existing compiler technology and requiring only a modest amount of special purpose hardware. We present simulation results for fast-page mode and Rambus DRAM memory systems and we describe a prototype system with which we have observed performance improvements for inner loops by factors of 13 over traditional access methods.

modeling analysis and simulation on computer and telecommunication systems | 1998

Caches as filters: a new approach to cache analysis

Dee A. B. Weikle; Sally A. McKee; William A. Wulf

As the processor-memory performance gap continues to grow, so does the need for effective tools and metrics to guide the design of efficient memory hierarchies to bridge that gap. Aggregate statistics of cache performance can be useful for comparison, but they give us little insight into how to improve the design of a particular component. We propose a different approach to cache analysis-viewing caches as filters-and present two new metrics for analyzing cache behavior: instantaneous hit rate and instantaneous locality. We demonstrate how these measures can give us insight into the reference pattern of an executing program, and show an application of these measures in analyzing the effectiveness of the second level cache of a particular memory hierarchy.

ACM Transactions on Architecture and Code Optimization | 2006

Evaluating trace cache energy efficiency

Michele Co; Dee A. B. Weikle; Kevin Skadron

Future fetch engines need to be energy efficient. Much research has focused on improving fetch bandwidth. In particular, previous research shows that storing concatenated basic blocks to form instruction traces can significantly improve fetch performance. This work evaluates whether this concatenating of basic blocks translates to significant energy-efficiency gains. We compare processor performance and energy efficiency in trace caches compared to instruction caches. We find that, although trace caches modestly outperform instruction cache only alternatives, it is branch-prediction accuracy that really determines performance and energy efficiency. When access delay and area restrictions are considered, our results show that sequential trace caches achieve very similar performance and energy efficiency results compared to instruction cache-based fetch engines and show that the trace caches failure to significantly outperform the instruction cache-based fetch organizations stems from the poorer implicit branch prediction from the next-trace predictor at smaller areas. Because access delay limits the theoretical performance of the evaluated fetch engines, we also propose a novel ahead-pipelined next-trace predictor. Our results show that an STC fetch organization with a three-stage, ahead-pipelined next-trace predictor can achieve 5--17% IPC and 29% ED2 improvements over conventional, unpipelined organizations.

Archive | 2001