Mehrdad Reshadi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mehrdad Reshadi is active.

Explore More

Publication

Featured researches published by Mehrdad Reshadi.

design automation conference | 2003

Instruction set compiled simulation: a technique for fast and flexible instruction set simulation

Mehrdad Reshadi; Prabhat Mishra; Nikil D. Dutt

Instruction set simulators are critical tools for the exploration and validation of new programmable architectures. Due to increasing complexity of the architectures and time-to-market pressure, performance is the most important feature of an instruction-set simulator. Interpretive simulators are flexible but slow, whereas compiled simulators deliver speed at the cost of flexibility. This paper presents a novel technique for generation of fast instruction-set simulators that combines the benefit of both compiled and interpretive simulation. We achieve fast instruction accurate simulation through two mechanisms. First, we move the time-consuming decoding process from run-time to compile time while maintaining the flexibility of the interpretive simulation. Second, we use a novel instruction abstraction technique to generate aggressively optimized decoded instructions that further improves simulation performance. Our instruction set compiled simulation (IS-CS) technique delivers up to 40% performance improvement over the best known published result that has the flexibility of the interpretive simulation. We illustrate the applicability of the IS-CS technique using the ARM7 embedded processor.

international conference on hardware/software codesign and system synthesis | 2005

A cycle-accurate compilation algorithm for custom pipelined datapaths

Daniel D. Gajski; Mehrdad Reshadi

Traditional high level synthesis (HLS) techniques generate a datapath and controller for a given behavioral description. The growing wiring cost and delay of today technologies require aggressive optimizations, such as interconnect pipelining, that cannot be done after generating the datapath and without invalidating the schedule. On the other hand, the increasing manufacturing complexities demand approaches that favor design for manufacturability (DFM).To address these problems we propose an approach in which the datapath of the architecture is fully allocated before scheduling and binding. We compile a C program directly to the datapath and generate the controller. We can support the entire ANSI C syntax because the datapath can be as complex as the datapath of a processor. Since there is no instruction abstraction in this architecture we call it No-Instruction-Set-Computer (NISC). As the first step towards realization of a NISC-based design flow, we present an algorithm that maps an application on a given datapath by performing scheduling and binding simultaneously. With this algorithm, we achieved up to 70% speedup on a NISC with a datapath similar to that of MIPS, compared to a MIPS gcc compiler. It also efficiently handles different datapath features such as pipelining, forwarding and multi-cycle units.

ACM Transactions in Embedded Computing Systems | 2009

Hybrid-compiled simulation: An efficient technique for instruction-set architecture simulation

Mehrdad Reshadi; Prabhat Mishra; Nikil D. Dutt

Instruction-set simulators are critical tools for the exploration and validation of new processor architectures. Due to the increasing complexity of architectures and time-to-market pressure, performance is the most important feature of an instruction-set simulator. Interpretive simulators are flexible but slow, whereas compiled simulators deliver speed at the cost of flexibility and compilation overhead. This article presents a hybrid instruction-set-compiled simulation (HISCS) technique for generation of fast instruction-set simulators that combines the benefit of both compiled and interpretive simulation. This article makes two important contributions: (i) it improves the interpretive simulation performance by applying compiled simulation at the instruction level using a novel template-customization technique to generate optimized decoded instructions during compile time; and (ii) it reduces the compile-time overhead by combining the benefits of both static and dynamic-compiled simulation. Our experimental results using two contemporary processors (ARM7 and SPARC) demonstrate an order-of-magnitude reduction in compilation time as well as a 70% performance improvement, on average, over the best-known published result in instruction-set simulation.

international conference on computer design | 2005

Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths

Mehrdad Reshadi; Bita Gorjiara; Daniel D. Gajski

Performance of programs can be improved by utilizing their horizontal and vertical parallelism. In some processors (VLIW based), compiler can utilize horizontal parallelism by controlling the schedule of independent operations. Vertical parallelism is utilized through pipelining. However, in all processors, structure of pipeline is fixed and compiler has no control over it. In application-specific-instruction set-processors (ASIPs), pipeline structure can be customized and utilized in the program through custom instructions. Practical constraints on the instruction decoder limit the number and complexity of custom instructions in ASIPs. Detecting the frequent and beneficial custom instructions and incorporating them in the compiler are complex and sometimes very time consuming tasks. In this paper, we present an architecture that does not limit the number of custom functionalities that can be implemented on its datapath. Instead of using custom instructions and then relying on the decoder in hardware to generate the control signals, we generate the control signal values in compiler. Since there are no predefined instructions in this architecture, we call it no-instruction-set-computer (NISC). The NISC compiler maps the application directly on the datapath. It has complete fine grain control over datapath and hence can very well utilize resources in the hardware as well as horizontal and vertical parallelism in the program. We also explain the algorithm for mapping the CDFG of a program on a given datapath in NISC. Using our algorithm and a NISC architecture with the datapath of a MIPS, we achieved up to 70% speedup over the traditional MIPS compiler. In another experiment, we started from a base architecture and customized it by adding resources and interconnect to increase its horizontal and vertical parallelism. The algorithm achieved up to 15.5 times speedup by utilizing the available parallelism in the program and the datapath.

ACM Transactions in Embedded Computing Systems | 2006

A retargetable framework for instruction-set architecture simulation

Mehrdad Reshadi; Nikil D. Dutt; Prabhat Mishra

Instruction-set architecture (ISA) simulators are an integral part of todays processor and software design process. While increasing complexity of the architectures demands high-performance simulation, the increasing variety of available architectures makes retargetability a critical feature of an instruction-set simulator. Retargetability requires generic models while high-performance demands target specific customizations. To address these contradictory requirements, we have developed a generic instruction model and a generic decode algorithm that facilitates easy and efficient retargetability of the ISA-simulator for a wide range of processor architectures, such as RISC, CISC, VLIW, and variable length instruction-set processors. The instruction model is used to generate compact and easy to debug instruction descriptions that are very similar to that of architecture manual. These descriptions are used to generate high-performance simulators. Our retargetable framework combines the flexibility of interpretive simulation with the speed of compiled simulation. The generation of the simulator is completely separate from the simulation engine. Hence, we can incorporate any fast simulation technique in our retargetable framework without introducing any performance penalty. To demonstrate this, we have incorporated fast IS-CS simulation engine in our retargetable framework which has generated 70% performance improvement over the best known simulators in this category. We illustrate the retargetability of our approach using two popular, yet different, realistic architectures: the SPARC and the ARM.

asia and south pacific design automation conference | 2004

Fast and efficient voltage scheduling by evolutionary slack distribution

Bita Gorjiara; Pai H. Chou; Nader Bagherzadeh; Mehrdad Reshadi; David W. Jensen

To minimize energy consumption by voltage scaling in design of heterogeneous real-time embedded systems, it is necessary to perform two distinct tasks: task scheduling (TS) and voltage selection (VS). Techniques proposed to date either are fast but yield inefficient results, or output efficient solutions after many slow iterations. As a core problem to solve in the inner loop of a system-level optimization cycle, it is critical that the algorithm be fast while producing high quality results. This paper presents a new technique called Evolutionary Relative Slack Distribution Voltage Scheduling (ERSD-VS) that achieves both speed and efficiency. It addresses priority adjustment and slack distribution issues with low cost heuristics. Experimental results from running publicly available testbenches show up to 42% energy saving compared to a published technique called EVEN-VS. It also shows up to 70 times speed improvement compared to an efficient technique called EE-GLSA.

design, automation, and test in europe | 2005

Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation

Mehrdad Reshadi; Nikil D. Dutt

Detailed modeling of processors and high performance cycle-accurate simulators are essential for todays hardware and software design. These problems are challenging enough by themselves and have seen many previous research efforts. Addressing both simultaneously is even more challenging, with many existing approaches focusing on one over another. In this paper, we propose the reduced colored Petri net (RCPN) model that has two advantages: first, it offers a very simple and intuitive way of modeling pipelined processors: second, it can generate high performance cycle-accurate simulators. RCPN benefits from all the useful features of colored Petri nets without suffering from their exponential growth in complexity. RCPN processor models are very intuitive since they are a mirror image of the processor pipeline block diagram. Furthermore, in our experiments on the generated cycle-accurate simulators for XScale and StrongArm processor models, we achieved an order of magnitude (/spl sim/15 times) speedup over the popular SimpleScalar ARM simulator.

international conference on computer design | 2006

Generic Architecture Description for Retargetable Compilation and Synthesis of Application-Specific Pipelined IPs

Bita Gorjiara; Mehrdad Reshadi; Daniel D. Gajski

Constraints of embedded systems and the shrinking time-to-market have elevated the importance of designer productivity and design predictability more than ever. To improve productivity, in ASIP approaches the system is designed with software and executed on a customized processor. In ASIP design flow, the processor is described in an Architecture Description Language (ADL) and the toolset is generated from that ADL automatically. However, in these approaches design predictability is low because the designer has little or no control over the quality of the final implementation. In this paper, we present a new design approach where the target processor or Intellectual Property (IP) does not have any predefined instruction-set and its datapath component netlist is described in a Generic Netlist Representation (GNR). The GNR is used by the toolset to generate the controller of the IP and the RTL of the design. The GNR is an order of magnitude shorter than state-of-the-art ADLs with RTL generation capabilities and yet can capture any structural details that affect the implementation quality. We have also developed a web-based interface for our toolset, so that users can upload and evaluate new IPs described in GNR.

international conference on computer design | 2003

Reducing compilation time overhead in compiled simulators

Mehrdad Reshadi; Nikil D. Dutt

Compiled simulation is a well known technique for improving the performance of instruction set simulators at the cost of compilation time. However the compilation time overhead makes such usage of compiler optimizations impractical especially for large applications. We propose a hybrid compiled simulation approach that is simple, generates an optimized decoder and has almost no compilation overhead comparing to static compiled simulation. Using two contemporary processor models- ARM7 and Sparc- we demonstrated that our technique can reduce the compilation time by 99% on the average, from several thousands of seconds to only tens of seconds.

design automation conference | 2008

C-based design flow: a case study on G.729A for voice over internet protocol (VoIP)

Mehrdad Reshadi; Bita Gorjara; Daniel D. Gajski

In this paper we present the design of a G.729a codec in a C- based design flow. The codec is used in VoIP applications for sending speech over internet protocol. We started from the standard reference C implementation and generated several customized designs using the NISCT C-to-RTL toolset. Our final designs could run at very low clock frequencies (11 MHz for the decoder and 30 MHz for the coder) while meeting the timing requirements of the standard. We present these designs and the corresponding C-based design flow in this paper.

Explore More