Is this you? Create Your Porfile

Peter Grun

University of California, Irvine

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Grun is active.

Explore More

Publication

Featured researches published by Peter Grun.

design, automation, and test in europe | 1999

EXPRESSION: a language for architecture exploration through compiler/simulator retargetability

Ashok Halambi; Peter Grun; Vijay Ganesh; Asheesh Khare; Nikil D. Dutt; Alexandru Nicolau

We describe EXPRESSION, a language supporting architectural design space exploration for embedded systems-on-chip (SOC) and automatic generation of a retargetable compiler/simulator toolkit. Key features of our language-driven design methodology include: a mixed behavioral/structural representation supporting a natural specification of the architecture, explicit specification of the memory, subsystem allowing novel memory organizations and hierarchies; clean syntax and ease of modification supporting architectural exploration; a single specification supporting consistency and completeness checking of the architecture; and efficient specification of architectural resource constraints allowing extraction of detailed reservation tables for compiler scheduling. We illustrate key features of EXPRESSION through simple examples and demonstrate its efficacy in supporting exploration and automatic software toolkit generation for an embedded SOC codesign flow.

Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98) | 1998

Memory size estimation for multimedia applications

Peter Grun; Florin Balasa; Nikil D. Dutt

Memory modules dominate the cost, performance, and power of embedded systems that process multidimensional signals, typically present in image and video processing. Therefore, studying the impact of parallelism on memory size is crucial for trading off system performance against area cost to enable intelligent system partitioning and exploration. We propose a memory size estimation method for algorithmic specifications containing multidimensional arrays and parallel constructs, intended as part of a high-level partitioning and exploration methodology. The system designer can trade-off estimation accuracy for increased run time. We present the results of our estimation approach on a number of image and video processing kernels, and discuss some preliminary results on the influence of parallelism on storage requirement.

design automation conference | 2000

Memory aware compilation through accurate timing extraction

Peter Grun; Nikil D. Dutt; Alexandru Nicolau

Memory delays represent a major bottleneck in embedded systems performance. Newer memory modules exhibiting efficient access modes (e.g., page-, burst-mode) partly alleviate this bottleneck. However, such features can not be efficiently exploited in processor-based embedded systems without memory-aware compiler support. We describe a memory-aware compiler approach that exploits such efficient memory access modes by extracting accurate timing information, allowing the compilers scheduler to perform global code reordering to better hide the latency of memory operations. Our memory-aware compiler scheduled several benchmarks on the TI C6201 processor architecture interfaced with a 2-bank synchronous DRAM and generated average improvements of 24% over the best possible schedule using a traditional (memory-transparent) optimizing compiler, demonstrating the utility of our memory-aware compilation approach.

IEEE Transactions on Very Large Scale Integration Systems | 2003

RTGEN-an algorithm for automatic generation of reservation tables from architectural descriptions

Peter Grun; Ashok Halambi; Nikil D. Dutt; Alexandru Nicolau

Reservation Tables (RTs) have long been used to detect conflicts between operations that simultaneously access the same architectural resource. Traditionally, these RTs have been specified explicitly by the designer. However, the increasing complexity of modern processors makes the manual specification of RTs cumbersome and error-prone. Furthermore, manual specification of such conflict information is infeasible for supporting rapid architectural exploration. In this paper we present an algorithm to automatically generate RTs from a high-level processor description, with the goal of avoiding manual specification of RTs, resulting in more concise architectural specifications and also supporting faster turn-around time in Design Space Exploration. We demonstrate the utility of our approach on a set of experiments using the TI C6201 VLIW DSP and DLX processor architectures, and a suite of multimedia and scientific applications.

design, automation, and test in europe | 2001

Access pattern based local memory customization for low power embedded systems

Peter Grun; Nikil D. Dutt; Alexandru Nicolau

Memory accesses represent a major bottleneck in embedded systems power and performance. Traditionally, the local memory relied on a large cache to store all the variables in the application. However, especially in large real-life applications, different types of data exhibit divergent types of locality and access patterns, with diverse locality and bandwidth needs. Traditional caches had to compromise between the different types of locality required by the access patterns, and trade-off performance against bandwidth requirement. Instead, our approach customizes the local memory architecture matching the diverse access patterns and locality types present in the application, to reduce the main memory bandwidth requirement, and significantly improve power consumption, without sacrificing performance. Our approach generated an average 30% memory power reduction without degrading performance on a set of large multimedia/general purpose applications and scientific kernels, over the best traditional cache configuration of similar size, demonstrating the utility of our algorithm.

international symposium on systems synthesis | 2001

APEX: access pattern based memory architecture exploration

Peter Grun; Nikil D. Dutt; Alexandru Nicolau

Memory accesses represent a major bottleneck in embedded systems power and performance. Traditionally, designers tried to alleviate this problem by relying on a simple cache hierarchy, or a limited use of special purpose memory modules such as stream buffers. Although real-life applications contain a large number of memory references to a diverse set of data structures, a significant percentage of all memory accesses in the application are generated from a few memory instructions that exhibit predictable, well known access patterns; this creates an opportunity for memory customization, targeting the needs of these access patterns. We present APEX, an approach that extracts, analyzes and clusters the most active access patterns in the application, and aggressively customizes the memory architecture to match the needs of the application, exploring a wide range of cost, performance and power designs. We use a heuristic to prune the design space, guiding the exploration towards the best cost/gain ratios. We present experiments on a set of large real-life benchmarks, showing significant performance improvements for varied cost and power characteristics, allowing the designer to best target the system goals.

Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium | 1999

V-SAT: a visual specification and analysis tool for system-on-chip exploration

Asheesh Khare; Nicolae Savoiu; Ashok Halambi; Peter Grun; Nikil D. Dutt; Alexandru Nicolau

We describe V-SAT, a tool for performing design space exploration of System-On-Chip (SOC) architectures. The key components of V-SAT include EXPRESSION, a language for specification of the architecture, SIMPRESS, a simulator generator for analysis/evaluation of the architecture, and the V-SAT GUl front-end for easy specification and detailed analysis. We give a brief overview of the components (EXPRESSION, SIMPRESS and GUI) and, using an example DLX architecture, demonstrate V-SATs usefulness in exploration for an embedded SOC codesign flow by specifying and evaluating several modifications to the pipeline structure of the processor. We believe that V-SAT provides a powerful environment, both for early design space exploration, as well as for the detailed design of SOC architectures.

international symposium on systems synthesis | 1999

RTGEN: an algorithm for automatic generation of reservation tables from architectural descriptions

Peter Grun; Ashok Halambi; Nikil D. Dutt; Alexandru Nicolau

Reservation tables (RTs) have long been used to detect conflicts between operations that simultaneously access the same architectural resource. Traditional these RTs have been specified explicitly by the designer. However, the increasing complexity of modern processors makes the manual specification of RTs cumbersome and error-prone. Furthermore, manual specification of such conflict information is infeasible for supporting rapid architectural exploration. We present an algorithm to automatically generate RTs from a high-level processor description, with the goal of avoiding manual specification of RTs, resulting in more concise architectural specifications and also supporting faster turn-around time in design space exploration. We demonstrate the utility of our approach on a set of experiments using the TI C6201 VLIW DSP and DLX processor architectures, and a suite of multimedia and scientific applications.

international conference on computer aided design | 2000

MIST: an algorithm for memory miss traffic management

Peter Grun; Nikil D. Dutt; Alexandru Nicolau

Cache misses represent a major bottleneck in embedded systems performance. Traditionally, compilers optimistically treated all memory accesses as cache hits, relying on the memory controller to account for longer miss delays. However, the memory controller has only a local view of the program, and is not able to efficiently hide the latency of these memory operations. Our compiler technique actively manages cache misses, and performs global miss traffic optimizations, to better hide the latency of the memory operations. Our memory-aware compiler scheduled several benchmarks on the TIC6211 processor architecture with a direct mapped cache, and generated an average of 61.6% improvement over the best schedule of the traditional (memory-transparent) optimizing compiler, demonstrating the utility of our miss traffic optimization approach.

international conference on vlsi design | 2001

Processor-memory co-exploration driven by a Memory-Aware Architecture Description Language

Prabhat Mishra; Peter Grun; Nikil D. Dutt; Alexandru Nicolau

Memory represents a major bottleneck in modern embedded systems. Traditionally, memory organizations for programmable systems assumed a fixed cache hierarchy. With the widening processor-memory gap, more aggressive memory technologies and organizations have appeared, allowing customization of a heterogeneous memory architecture tuned for the application. However, such a processor-memory co-exploration approach critically needs the ability to explicitly capture heterogeneous memory architectures. We present in this paper a language-based approach to explicitly capture the memory subsystem configuration, and perform exploration of the memory architecture to trade-off cost versus performance. We present a set of experiments using our Memory-Aware Architectural Description Language to drive the exploration of the memory subsystem for the TIC6211 processor architecture, demonstrating a range of cost and performance attributes.

Explore More