Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bita Gorjiara is active.

Publication


Featured researches published by Bita Gorjiara.


international conference on computer design | 2005

Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths

Mehrdad Reshadi; Bita Gorjiara; Daniel D. Gajski

Performance of programs can be improved by utilizing their horizontal and vertical parallelism. In some processors (VLIW based), compiler can utilize horizontal parallelism by controlling the schedule of independent operations. Vertical parallelism is utilized through pipelining. However, in all processors, structure of pipeline is fixed and compiler has no control over it. In application-specific-instruction set-processors (ASIPs), pipeline structure can be customized and utilized in the program through custom instructions. Practical constraints on the instruction decoder limit the number and complexity of custom instructions in ASIPs. Detecting the frequent and beneficial custom instructions and incorporating them in the compiler are complex and sometimes very time consuming tasks. In this paper, we present an architecture that does not limit the number of custom functionalities that can be implemented on its datapath. Instead of using custom instructions and then relying on the decoder in hardware to generate the control signals, we generate the control signal values in compiler. Since there are no predefined instructions in this architecture, we call it no-instruction-set-computer (NISC). The NISC compiler maps the application directly on the datapath. It has complete fine grain control over datapath and hence can very well utilize resources in the hardware as well as horizontal and vertical parallelism in the program. We also explain the algorithm for mapping the CDFG of a program on a given datapath in NISC. Using our algorithm and a NISC architecture with the datapath of a MIPS, we achieved up to 70% speedup over the traditional MIPS compiler. In another experiment, we started from a base architecture and customized it by adding resources and interconnect to increase its horizontal and vertical parallelism. The algorithm achieved up to 15.5 times speedup by utilizing the available parallelism in the program and the datapath.


design automation conference | 2008

Automatic architecture refinement techniques for customizing processing elements

Bita Gorjiara; Daniel D. Gajski

In this paper, we propose an approach for designing high- performance energy-efficient processing elements (PEs) using statically- scheduled nanocode-based architectures. Our approach is based on bottom-up refinement/trimming techniques that optimize a given datapath irrespective of whether it was designed manually or generated automatically. The optimizations can also preserve parts of the netlist specified by the designers, and hence, allow reuse of design efforts and can lead to predictable convergence. In this paper, we show that trimming unused and underutilized resources of typical general-purpose datapaths can lead to 30-40% average energy savings, without any performance loss. However, general-purpose architectures often compromise parallelism to make the design implementable. With our trimming approach, we can afford to have a base architecture that is not intended for implementation and has more parallelism, and then apply refinement to make it implementable. For our benchmarks, we achieved up to 1.8 times (avg. 25%) and 2.6 times (avg. 40%) performance improvement, compared to two general-purpose architectures (i.e. a 4- issue VLIW and a DLX), respectively. Additionally, the energy consumption is reduced by up to 5 times (avg. 2 times) compared to the trimmed general-purpose architectures.


asia and south pacific design automation conference | 2004

Fast and efficient voltage scheduling by evolutionary slack distribution

Bita Gorjiara; Pai H. Chou; Nader Bagherzadeh; Mehrdad Reshadi; David W. Jensen

To minimize energy consumption by voltage scaling in design of heterogeneous real-time embedded systems, it is necessary to perform two distinct tasks: task scheduling (TS) and voltage selection (VS). Techniques proposed to date either are fast but yield inefficient results, or output efficient solutions after many slow iterations. As a core problem to solve in the inner loop of a system-level optimization cycle, it is critical that the algorithm be fast while producing high quality results. This paper presents a new technique called Evolutionary Relative Slack Distribution Voltage Scheduling (ERSD-VS) that achieves both speed and efficiency. It addresses priority adjustment and slack distribution issues with low cost heuristics. Experimental results from running publicly available testbenches show up to 42% energy saving compared to a published technique called EVEN-VS. It also shows up to 70 times speed improvement compared to an efficient technique called EE-GLSA.


embedded systems for real-time multimedia | 2005

Custom processor design using NISC: a case-study on DCT algorithm

Bita Gorjiara; Daniel D. Gajski

Designing application-specific instruction-set processors (ASIPs) usually requires designing a custom datapath, and modifying instruction-set, instruction decoder, and compiler. A new alternative to ASIPs is no-instruction-set-computers (NISCs) that eliminate the instruction abstraction by compiling programs directly to a given datapath. The compiler analyzes the datapath and extracts possible operations and data flows. The NISC approach simplifies and accelerates the task of custom processor design. In this paper, we present a case-study of designing a custom datapath for a 2D DCT algorithm. We applied several optimization techniques such as software transformations, operation chaining, datapath pipelining, controller pipelining, and functional unit customization to improve the quality of the design. Most of the techniques are general and can be applied to other applications. The result of synthesizing our final custom datapath on a Xilinx FPGA shows 7.14 times performance improvement, 1.64 times power reduction, 12.5 times energy savings, and more than 3 times area reduction compared to a softcore MIPS implementation.


ACM Transactions on Design Automation of Electronic Systems | 2007

Ultra-fast and efficient algorithm for energy optimization by gradient-based stochastic voltage and task scheduling

Bita Gorjiara; Nader Bagherzadeh; Pai H. Chou

This paper presents a new technique, called Adaptive Stochastic Gradient Voltage-and-Task Scheduling (ASG-VTS), for power optimization of multicore hard realtime systems. ASG-VTS combines stochastic and energy-gradient techniques to simultaneously solve the slack distribution and task reordering problem. It produces very efficient results with few mode transitions. Our experiments show that ASG-VTS reduces number of mode transitions by 4.8 times compared to traditional energy-gradient-based approaches. Also, our heuristic algorithm can quickly find a solution that is as good as the optimal for a real-life GSM encoder/decoder benchmark. The runtime of ASG-VTS is 150 times and 1034 times faster than energy-gradient based and optimal ILP algorithms, respectively. Since the runtime of ASG-VTS is very low, it is ideal for design space exploration in system-level design tools. We have also developed a web-based interface for ASG-VTS algorithm.


international conference on computer design | 2006

Generic Architecture Description for Retargetable Compilation and Synthesis of Application-Specific Pipelined IPs

Bita Gorjiara; Mehrdad Reshadi; Daniel D. Gajski

Constraints of embedded systems and the shrinking time-to-market have elevated the importance of designer productivity and design predictability more than ever. To improve productivity, in ASIP approaches the system is designed with software and executed on a customized processor. In ASIP design flow, the processor is described in an Architecture Description Language (ADL) and the toolset is generated from that ADL automatically. However, in these approaches design predictability is low because the designer has little or no control over the quality of the final implementation. In this paper, we present a new design approach where the target processor or Intellectual Property (IP) does not have any predefined instruction-set and its datapath component netlist is described in a Generic Netlist Representation (GNR). The GNR is used by the toolset to generate the controller of the IP and the RTL of the design. The GNR is an order of magnitude shorter than state-of-the-art ADLs with RTL generation capabilities and yet can capture any structural details that affect the implementation quality. We have also developed a web-based interface for our toolset, so that users can upload and evaluate new IPs described in GNR.


international conference on hardware/software codesign and system synthesis | 2006

Generic netlist representation for system and PE level design exploration

Bita Gorjiara; Mehrdad Reshadi; Pramod Chandraiah; Daniel D. Gajski

Designer productivity and design predictability are vital factors for successful embedded system design. Shrinking time-to-market and increasing complexity of these systems require more productive design approaches starting from high-level languages such as C. On the other hand, tight constraints of embedded systems require careful design exploration at system level (coarse grained exploration) and at the processing-element (PE) level (fine grained exploration). In this paper we presented GNR, a formal modeling approach, developed to improve productivity of designing systems and processing elements, the same way that traditional ADLs improved productivity for designing processors. The GNR is an order of magnitude shorter than state-of-the-art ADLs with RTL generation capabilities and yet can capture any structural details that affect the implementation quality. Using relatively short GNR description, we explored several designs for implementing an MP3 decoder and achieved 3.25 speedup compared to MicroBlaze processor. We have also developed a Web-based interface for our tools, so that users can upload and evaluate new architectures described in GNR Our toolset and GNR is an intermediate step towards synthesis of TLM to RTL.


asia and south pacific design automation conference | 2006

Designing a custom architecture for DCT using NISC technology

Bita Gorjiara; Mehrdad Reshadi; Daniel D. Gajski

This paper presents design of a custom architecture for discrete cosine transform (DCT) using no-instruction-set computer (NISC) technology that is developed for fast processor customization. Using several software transformations and hardware customization, we achieved up to 10 times performance improvement, 2 times power reduction, 12.8 times energy reduction, and 3 times area reduction compared to an already-optimized soft-core MIPS implementation


digital systems design | 2006

A Graph Based Algorithm for Data Path Optimization in Custom Processors

Jelena Trajkovic; Mehrdad Reshadi; Bita Gorjiara; Daniel D. Gajski

The rising complexity, customization and short time to market of modern digital systems requires automatic methods for generation of high performance architectures for such systems. This paper presents algorithms to automatically create custom data path for a given application that optimizes both resource utilization and performance. The inputs to the architecture generator include application source code, operation execution frequency obtained by the profile run and a component library (consisting of ALUs, busses, multiplexers etc.). The output is the application specific data path specified as the set of resource instances and their connections. The algorithm starts with a dense architecture and iteratively refines it until an efficient architecture is derived. The key optimization goal is to keep performance within given boundaries while maximizing resource utilization. Our experimental results show that generated architectures are comparable to manual designs, but can be obtained in a matter of few seconds, thereby leading to significant productivity gains


Processor Description Languages#R##N#Applications and Methodologies | 2008

GNR: A Formal Language for Specification, Compilation, and Synthesis of Custom Embedded Processors

Bita Gorjiara; Mehrdad Reshadi; Daniel D. Gajski

Publisher Summary This chapter focuses on generic netlist representation (GNR), which is used for describing general purpose and custom embedded processors as well as multicore systems using no-instruction-set-computer (NISC) technology. An NISC architecture is composed of a datapath and a controller, and the datapath of NISC can be simple or as complex as the datapath of a processor. In NISC, the target architecture is a statically scheduled nanocoded architecture that does not have any predefined instruction set. Thus GNR does not contain any instruction-set or instruction decoder specification and hence is very concise, which then can be used to synthesize high-quality hardware. The GNR is strongly typed and defines extensible aspects for components so that different tools such as the compiler. NISC uses nanocode that determines actual control configuration of every low-level datapath component such as multiplexers, register, and functional units, and to overcome the increased code size of NISC, the nanocodes are compressed and a low-overhead decompression stage is added during the controller synthesis. The GNR can also be used to describe multicore systems on a pin-accurate level.

Collaboration


Dive into the Bita Gorjiara's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pai H. Chou

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nikil D. Dutt

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge