Is this you? Create Your Porfile

Frank Vahid

University of California, Riverside

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Frank Vahid is active.

Explore More

Publication

Featured researches published by Frank Vahid.

Information Technology | 1994

Specification and design of embedded systems

Daniel D. Gajski; Frank Vahid; Sanjiv Narayan; Jie Gong

1. Introduction. 2. Models and Architectures. 3. Specification Languages. 4. A Specification Example. 5. Translation to VHDL. 6. System Partitioning. 7. Design Quality Estimation. 8. Specification Refinement into Synthesizable Models. 9. System-Design Methodology and Environment. Appendix: Answering Machine in SpecCharts. Bibliography. Index.

international symposium on computer architecture | 2003

A highly configurable cache architecture for embedded systems

Chuanjun Zhang; Frank Vahid; Walid A. Najjar

Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is largely determined by the behavior of the application using that cache. Desktop systems have to accommodate a very wide range of applications and therefore the manufacturer usually sets the cache architecture as a compromise given current applications, technology and cost. Unlike desktop systems, embedded systems are designed to run a small range of well-defined applications. In this context, a cache architecture that is tuned for that narrow range of applications can have both increased performance as well as lower energy consumption. We introduce a novel cache architecture intended for embedded microprocessor platforms. The cache can be configured by software to be direct-mapped, two-way, or four-way set associative, using a technique we call way concatenation, having very little size or performance overhead. We show that the proposed cache architecture reduces energy caused by dynamic power compared to a way-shutdown cache. Furthermore, we extend the cache architecture to also support a way shutdown method designed to reduce the energy from static power that is increasing in importance in newer CMOS technologies. Our study of 23 programs drawn from Powerstone, MediaBench and Spec2000 show that tuning the caches configuration saves energy for every program compared to conventional four-way set-associative as well as direct mapped caches, with average savings of 40% compared to a four-way conventional cache.

design automation conference | 2003

Dynamic hardware/software partitioning: a first approach

Greg Stitt; Roman L. Lysecky; Frank Vahid

Partitioning an application among software running on a microprocessor and hardware co-processor in on-chip configurable logic has been shown to improve performance and energy consumption in embedded systems. Meanwhile, dynamic software optimization methods have shown the usefulness and feasibility of runtime program optimization, but those optimizations do not achieve as much as partitioning. We introduce a first approach to dynamic hardware/software partitioning. We describe our system architecture and initial on-chip tools, including profiler, decompiler, synthesis, and placement and routing tools for a simplified configurable logic fabric, able to perform dynamic partitioning of real benchmarks. We show speedups averaging 2.6 for five benchmarks taken from Powerstone, Netbench and our own benchmarks.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2002

Platune: a tuning framework for system-on-a-chip platforms

Tony Givargis; Frank Vahid

System-on-a-chip (SOC) platform manufacturers are increasingly adding configurable features that provide power and performance flexibility in order to increase a platforms applicability. This paper presents a framework, called Platune, for performance and power tuning of one such SOC platform. Platune is used to simulate an embedded application that is mapped onto the SOC platform and output performance and power metrics for any configuration of the SOC platform. Furthermore, Platune is used to automatically explore the large configuration space of such an SOC platform. The versatility, in terms of accuracy and speed of exploration, of Platune is demonstrated experimentally using three large benchmark examples. The power estimation techniques for processors, caches, memories, buses, and peripherals combined with the design space exploration algorithm deployed by Platune form a methodology for design-of tuning frameworks for parameterized SOC platforms in general.

design automation conference | 2006

Warp Processors

Roman L. Lysecky; Greg Stitt; Frank Vahid

We describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that also improve software using an FPGA but do so using a special compiler, a warp processor achieves these improvements completely transparently and operates from a standard binary. A warp processor dynamically detects the binarys critical regions, reimplements those regions as a custom hardware circuit in the FPGA, and replaces the software region by a call to the new hardware implementation of that region. While not all benchmarks can be improved using warp processing, many can, and the improvements are dramatically better than those achievable by more traditional architecture improvements. The hardest part of warp processing is that of dynamically reimplementing code regions on an FPGA, requiring partitioning, decompilation, synthesis, placement, and routing tools, all having to execute with minimal computation time and data memory so as to coexist on chip with the main processor. We describe the results of developing our warp processor. We developed a custom FPGA fabric specifically designed to enable lean place and route tools, and we developed extremely fast and efficient versions of partitioning, decompilation, synthesis, technology mapping, placement, and routing. Warp processors achieve overall application speedups of 6.3X with energy savings of 66p across a set of embedded benchmark applications. We further show that our tools utilize acceptably small amounts of computation and memory which are far less than traditional tools. Our work illustrates the feasibility and potential of warp processing, and we can foresee the possibility of warp processing becoming a feature in a variety of computing domains, including desktop, server, and embedded applications.

IEEE Transactions on Very Large Scale Integration Systems | 2002

System-level exploration for Pareto-optimal configurations in parameterized system-on-a-chip

Tony Givargis; Frank Vahid; Jörg Henkel

Provides a technique for efficiently exploring the configuration space of a parameterized system-on-a-chip (SOC) architecture to find all Pareto-optimal configurations. These configurations represent the range of meaningful power and performance tradeoffs that are obtainable by adjusting parameter values for a fixed application mapped onto the SOC architecture. The approach extensively prunes the potentially large configuration space by taking advantage of parameter dependencies. The authors have successfully incorporated the technique into the parameterized SOC tuning environment (Platune) and applied it to a number of applications.

field programmable gate arrays | 2004

A quantitative analysis of the speedup factors of FPGAs over processors

Zhi Guo; Walid A. Najjar; Frank Vahid; Kees A. Vissers

The speedup over a microprocessor that can be achieved by implementing some programs on an FPGA has been extensively reported. This paper presents an analysis, both quantitative and qualitative, at the architecture level of the components of this speedup. Obviously, the spatial parallelism that can be exploited on the FPGA is a big component. By itself, however, it does not account for the whole speedup.In this paper we experimentally analyze the remaining components of the speedup. We compare the performance of image processing application programs executing in hardware on a Xilinx Virtex E2000 FPGA to that on three general-purpose processor platforms: MIPS, Pentium III and VLIW. The question we set out to answer is what is the inherent advantage of a hardware implementation over a von Neumann platform. On the one hand, the clock frequency of general-purpose processors is about 20 times that of typical FPGA implementations. On the other hand, the iteration level parallelism on the FPGA is one to two orders of magnitude that on the CPUs. In addition to these two factors, we identify the efficiency advantage of FPGAs as an important factor and show that it ranges from 6 to 47 on our test benchmarks. We also identify some of the components of this factor: the streaming of data from memory, the overlap of control and data flow and the elimination of some instruction on the FPGA. The results provide a deeper understanding of the tradeoff between system complexity and performance when designing Configurable SoC as well as designing software for CSoC. They also help understand the one to two orders of magnitude in speedup of FPGAs over CPU after accounting for clock frequencies.

IEEE Transactions on Very Large Scale Integration Systems | 1998

SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design

Daniel D. Gajski; Frank Vahid; Sanjiv Narayan; Jie Gong

System-level design issues are gaining increasing attention, as behavioral synthesis tools and methodologies mature. We present the SpecSyn system-level design environment, which supports the new specify-explore-refine (SER) design paradigm. This three-step approach to design includes precise specification of system functionality, rapid exploration of numerous system-level design options, and refinement of the specification into one reflecting the chosen option. A system-level design option consists of an allocation of system components, such as standard and custom processors, memories, and buses, and a partitioning of functionality among those components. After refinement, the functionality assigned to each component can then he synthesized to hardware or compiled to software. We describe the issues and approaches for each part of the SpecSyn environment. The new paradigm and environment are expected to lead to a more than ten times reduction in design time, and our experiments support this expectation.

design, automation, and test in europe | 2004

Automatic tuning of two-level caches to embedded applications

Ann Gordon-Ross; Frank Vahid; Nikil D. Dutt

The power consumed by the memory hierarchy of a microprocessor can contribute to as much as 50% of the total microprocessor system power, and is thus a good candidate for optimizations. We present an automated method for tuning two-level caches to embedded applications for reduced energy consumption. The method is applicable to both a simulation-based exploration environment and a hardware-based system prototyping environment. We introduce the two-level cache tuner, or TCaT - a heuristic for searching the huge solution space of possible configurations. The heuristic interlaces the exploration of the two cache levels and searches the various cache parameters in a specific order based on their impact on energy. We show the integrity of our heuristic across multiple memory configurations and even in the presence of hardware/software partitioning - a common optimization capable of achieving significant speedups and/or reduced energy consumption. We apply our exploration heuristic to a large set of embedded applications. Our experiments demonstrate the efficacy of our heuristic: on average the heuristic examines only 7% of the possible cache configurations, but results in cache sub-system energy savings of 53%, only 1% more than the optimal cache configuration. In addition, the configured cache achieves an average speedup of 30% over the base cache configuration due to tuning of cache line size to the applications needs.

design automation conference | 1992

Specification partitioning for system design

Frank Vahid; Daniel D. Gajski

The authors focus on the goal of partitioning a behavior to satisfy chip-capacity constraints while considering system-performance constraints. A hardware implementation is assumed with a uniform chip technology. A new approach is introduced which partitions entire computations of a behavioral specification, such as processes and procedures, into chip behavioral specifications. The usefulness of the approach was demonstrated. The results of partitioning several examples using the specification partitioning tool being developed are provided.<<ETX>>

Explore More