Diana Keen
University of California, Davis
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Diana Keen.
international symposium on microarchitecture | 1999
Mark Oskin; Justin Hensley; Diana Keen; Frederic T. Chong; Matthew K. Farrens; Aneet Chopra
This study compares the speed, area, and power of different implementations of Active Pages, an intelligent memory system which helps bridge the growing gap between processor and memory performance by associating simple functions with each page of data. Previous investigations have shown up to 1000X speedups using a block of reconfigurable logic to implement these functions next to each subarray on a DRAM chip. In this study, we show that instruction-level parallelism, not hardware specialization, is the key to the previous success with reconfigurable logic. In order to demonstrate this fact, an Active Page implementation based upon a simplified VLIW processor was developed. Unlike conventional VLIW processors, power and area constraints lead to a design which has a small number of pipeline stages. Our results demonstrate that a four-wide VLIW processor attains comparable performance to that of pure FPGA logic but requires significantly less area and power.
IEEE Transactions on Computers | 2003
Diana Keen; Mark Oskin; Justin Hensley; Frederic T. Chong
The Active Pages model of intelligent memory can speed up data-intensive applications by up to two to three orders of magnitude over conventional systems. A fundamental problem with intelligent memory, however, arises when data cached by the processor is modified by logic in the memory. The Active Page model inherently limits sharing, keeping coherence tractable, but exacerbates saturation problems. We first present a hybrid snoopy/directory protocol for use in Active Pages. Limited sharing allows for a low-latency, low-bandwidth hybrid protocol. A transparent remapping mechanism is added for efficient caching. On smaller data sizes, explicit flushing and hardware coherence exhibit similar performance, but hardware coherence is easier to program and uses less bandwidth. Finally, we examine SMP multiprocessor systems to mitigate saturation effects. As the number of threads increases, the bandwidth needs increase, making hardware coherence even more attractive.
Parallel Processing Letters | 2000
Mark Oskin; Lucian Vlad Lita; Frederic T. Chong; Justin Hensley; Diana Keen
High DRAM densities will make intelligent memory chips a commodity in the next five years [1] [2]. This paper focuses upon a promising model of computation in intelligent memory, Active Pa#es[3], where computation is associated with each page of memory. Computational hardware scales linearly and inexpensively with data size in this model, reducing the order of many algorithms. This scaling can, for example, reduce linear-time algorithms to 0(y/ii). When page-based intelligent memory chips become available in commodity, they will change the way programmers select and utilize algorithms. In this paper, we analyze the asymptotic performance of several common algorithms as problem sizes scale. We also derive the optimal page size, as a function of problem size, for each algorithm running with intelligent memory. Finally, we validate these analyses with simulation results.
international conference on computer design | 2000
Mark Oskin; Diana Keen; Justin Hensley; Lucian Vlad Lita; Frederic T. Chong
Active Pages is a page-based model of intelligent memory specifically designed to support virtualized hardware resources. Previous work has shown substantial performance benefits from off loading data-intensive tasks to a memory system that implements Active Pages. With a simple VLIW processor embedded near each page on DRAM, Active Page memory systems achieve up to 1000X speedups over conventional memory systems. In this study, we examine Active Page memories that share, or multiplex, embedded VLIW processors across multiple physical Active Pages. We explore the trade-off between individual page-processor performance and page-level multiplexing. We find that hardware costs of computational logic can be reduced from 31% of DRAM chip area to 12%, through multiplexing, without significant loss in performance. Furthermore, manufacturing defects that disable up to 50% of the page processors can be tolerated through efficient resource allocation and associative multiplexing.
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems | 2003
John Y. Oliver; Ravishankar Rao; Paul Sultana; Jedidiah R. Crandall; Erik Czernikowski; Leslie W. Jones; Dean Copsey; Diana Keen; Venkatesh Akella; Frederic T. Chong
Embedded devices have hard performance targets and severe power and area constraints that depart significantly from our design intuitions derived from general-purpose microprocessor design. This paper describes our initial experiences in designing Synchroscalar, a tile-based embedded architecture targeted for multi-rate signal processing applications. We present a preliminary design of the Synchroscalar architecture and some design space exploration in the context of important signal processing kernels. In particular, we find that synchronous design and substantial global interconnect are desirable in the low-frequency, low-power domain. This global interconnect enables parallelization and reduces processor idle time, which are critical to energy efficient implementations of high bandwidth signal processing. Furthermore, statically-scheduled communication and SIMD computation keep control overheads low and energy efficiency high.
Parallel Processing Letters | 2002
Mark Oskin; Diana Keen; Justin Hensley; Lucian Vlad Lita; Frederic T. Chong
Advances in DRAM density have led to several proposals to perform computation in memory [1] [2] [3]. Active Pages is a page-based model of intelligent memory that can exploit large amounts of parallel computation in data-intensive applications. With a simple VLIW processor embedded near each page on DRAM, Active Page memory systems achieve up to 1000X speedups over conventional memory systems [4]. Active Pages are specifically designed to support virtualized hardware resources. In this study, we examine operating system techniques that allow Active Page memories to share, or multiplex, embedded VLIW processors across multiple physical Active Pages. We explore the trade-off between individual page-processor performance and page-level multiplexing. We find that hardware costs of computational logic can be reduced from 31% of DRAM chip area to 12%, through multiplexing, without significant loss in performance. Furthermore, manufacturing defects that disable up to 50% of the page processors can be tolerated through efficient resource allocation and associative multiplexing.
Archive | 2004
Diana Keen; Frederic T. Chong; Premkumar T. Devanbu; Matthew K. Farrens; Jeremy Brown; Jennifer Hollfelder; Xiu Ting Zhuang
Rising chip densities have led to dramatic improvements in the costperformance ratio of processors. At the same time, software costs are burgeoning. Large software systems are expensive to develop and are riddled with errors. Certain types of defects (e.g., those related to memory access, concurrency, and security) are particularly difficult to locate and can have devastating consequences. We believe it is time to explore using some of the increasing silicon real estate to provide extra functionality to support software development. We propose dedicating a portion of these new transistors to provide hardware structures to enhance software development, make debugging more efficient, increase reliability, and provide run-time security.
ACM Sigarch Computer Architecture News | 2002
Diana Keen; Frederic T. Chong
As computing and sensor devices become increasingly small and inexpensive, ubiquitous networks of embedded sensor devices will enable a new class of interesting applications. Specific instances of such systems have been designed and programmed in an ad-hoc, bottom-up manner [1] [2]. This bottom-up approach, however, is not an adequate programming model as such systems become ubiquitous. 2. The largest obstacle to enhancing the programming model is the unreliability of individual components and unknown network topology. Distributed systems have solved this by providing name servers that route messages and replicating these name servers. Our sensor device networks will not have the memory to hold such tables, so our computational elements will not be able to address each other individually. Instead, a virtual network will need to be 3. built on the physical substrate to facilitate communication. Unlike a traditional hardware-software co-design problem, we expect the physical and spatial constraints of the system to be a primary factor in synthesis. We present three case studies of sensor network applications, each representative of application classes with specific attributes in sensor location, motion, and desired aggregate functionality. These examples drive the design of our co-design system. We describe essential components of this system and identify key research challenges in its development. 4.
Archive | 1999
Justin Hensley; Mark Oskin; Diana Keen; Lucian-vlad Lita; T Frederic
Lecture Notes in Computer Science | 2005
John Y. Oliver; Ravishankar Rao; Paul Sultana; Jedidiah R. Crandall; Erik Czernikowski; Leslie W. Jones; Dean Copsey; Diana Keen; Venkatesh Akella; Frederic T. Chong