Is this you? Create Your Porfile

Soheil Ghiasi

University of California, Davis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Soheil Ghiasi is active.

Explore More

Publication

Featured researches published by Soheil Ghiasi.

international conference on computer aided design | 2004

A unified theory of timing budget management

Soheil Ghiasi; Elaheh Bozorgzadeh; Siddharth Choudhuri; Majid Sarrafzadeh

This work presents a theoretical framework that optimally solves many open problems in time budgeting. Our approach unifies a large class of existing time-management paradigms. Examples include time budgeting for maximizing total weighted delay relaxation, minimizing the maximum relaxation and min-skew time budget distribution. We show that many of the time management problems can be transformed into a min-cost flow instance that can be optimally and efficiently solved through well-known combinatorial techniques. Experiments include mapping of several designs, which are implemented using parameterized CoreGen IP cores, on Xilinx FPGA devices. Different time budgeting policies have been applied during the mapping stage. Our time management techniques always improved the area requirement of the implemented testbenches compared to a widely-used path-based method. We also compared the maximum budgeting and fairness in delay budget assignments. Our experimental results show that an average improvement of 19% in area can be achieved when fairness and maximum budgeting policies are combined, compared to pure maximum budgeting.

ACM Transactions in Embedded Computing Systems | 2004

An optimal algorithm for minimizing run-time reconfiguration delay

Soheil Ghiasi; Ani Nahapetian; Majid Sarrafzadeh

Reconfiguration delay is one of the major barriers in the way of dynamically adapting a system to its application requirements. The run-time reconfiguration delay is quite comparable to the application latency for many classes of applications and might even dominate the application run-time. In this paper, we present an efficient optimal algorithm for minimizing the run-time reconfiguration (context switching) delay of executing an application on a dynamically adaptable system. The system is composed of a number of cameras with embedded reconfigurable resources collaborating in order to track an object. The operations required to execute in order to track the object are revealed to the system at run-time and can change according to a number of parameters, such as the target shape and proximity. Similarly, we can assume that the applications comprising tasks are already scheduled and each of them has to be realized on the reconfigurable fabric in order to be executed.The modeling and the algorithm are both applicable to partially reconfigurable platforms as well as multi-FPGA systems. The algorithm can be directly applied to minimize the application run-time for the typical classes of applications, where the actual execution delay of the basic operations is negligible compared to the reconfiguration delay. We prove the optimality and the efficiency of our algorithm. We report the experimental results, which demonstrate a 2.5--40% improvement on the total run-time reconfiguration delay as compared to other heuristics.

asia and south pacific design automation conference | 2016

Design space exploration of FPGA-based Deep Convolutional Neural Networks

Mohammad Motamedi; Philipp Gysel; Venkatesh Akella; Soheil Ghiasi

Deep Convolutional Neural Networks (DCNN) have proven to be very effective in many pattern recognition applications, such as image classification and speech recognition. Due to their computational complexity, DCNNs demand implementations that utilize custom hardware accelerators to meet performance and energy-efficiency constraints. In this paper we propose an FPGA-based accelerator architecture which leverages all sources of parallelism in DCNNs. We develop analytical feasibility and performance estimation models that take into account various design and platform parameters. We also present a design space exploration algorithm for obtaining the implementation with the highest performance on a given platform. Simulation results with a real-life DCNN demonstrate that our accelerator outperforms other competing approaches, which disregard some sources of parallelism in the application. Most notably, our accelerator runs 1.9× faster than the state-of-the-art DCNN accelerator on the same FPGA device.

design automation conference | 2003

Optimal integer delay budgeting on directed acyclic graphs

Elaheh Bozorgzadeh; Soheil Ghiasi; Atsushi Takahashi; Majid Sarrafzadeh

Delay budget is an excess delay each component of a design can tolerate under a given timing constraint. Delay budgeting has been widely exploited to improve the design quality. We present an optimal integer delay budgeting algorithm. Due to numerical instability and discreteness of libraries of components during library mapping in design optimization flow, integer solution for delay budgeting is essential. We prove that integer budgeting problem - a 20-year old open problem in design optimization based on Y. Liao and C.K. Wong (1983) - can be solved optimally in polynomial time. We applied optimal delay budgeting in mapping applications on FPGA platform using pre-optimized cores of FPGA libraries. For each application we go through synthesis and place and route stages in order to obtain accurate results. Our optimal algorithm outperforms ZSA algorithm by R. Nair et al. (1989) in terms of area by 10% on average for all applications. In some applications, optimal delay budgeting can speedup runtime of place and route up to 2 times.

asia and south pacific design automation conference | 2003

Optimal reconfiguration sequence management

Soheil Ghiasi; Majid Sarrafzadeh

In this paper, we present an efficient optimal algorithm for minimizing runtime reconfiguration (context switching) delay of executing an application on a reconfigurable system. We assume that the basic operations of the application are already scheduled and each of them has to be realized on the reconfigurable fabric in order to be executed. The modeling and algorithm are both applicable to partially reconfigurable platforms as well as Multi-FPGA systems. The algorithm can be directly applied to minimize the application runtime for many typical classes of applications, where the actual execution delay of basic operations is negligible compared to reconfiguration delay. We prove the optimality and efficiency of our algorithm and report experimental results, which demonstrate 40% to 2.5% improvement in total runtime reconfiguration delay.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006

A Unified Theory of Timing Budget Management

Soheil Ghiasi; Elaheh Bozorgzadeh; Po-Kuan Huang; Roozbeh Jafari; Majid Sarrafzadeh

This paper presents a theoretical framework that solves optimally and in polynomial time many open problems in time budgeting. The approach unifies a large class of existing time-management paradigms. Examples include time budgeting for maximizing total weighted delay relaxation, minimizing the maximum relaxation, and min-skew time budget distribution. The authors develop a combinatorial framework through which we prove that many of the time-management problems can be transformed into a min-cost flow problem instance. The methodology is applied to intellectual-property-based datapath synthesis targeting field-programmable gate arrays. The synthesis flow maps the input operations to parameterized library modules during which different time budgeting policies have been applied. The techniques always improve the area requirement of the implemented test benches and consistently outperform a widely used competitor. The experiments verify that combining fairness and maximization objectives improves the results further as compared with pure maximum budgeting. The combined fairness and maximization objective improves the area by 25.8% and 28.7% in slice and LUT counts, respectively

architectures for networking and communications systems | 2008

A programmable architecture for scalable and real-time network traffic measurements

Faisal Khan; Lihua Yuan; Chen-Nee Chuah; Soheil Ghiasi

Accurate and real-time traffic measurement is becoming increasingly critical for large variety of applications including accounting, bandwidth provisioning and security analysis. Existing network measurement techniques, however, have major difficulty dealing with large number of flows in todays high-speed networks and offer limited scalability with increasing link speeds. Consequently, the current state of the art solutions have to resort to conservative sampling of the traffic stream and/or accounting for only a few frequent flows that often fail to provide accurate estimates of traffic features. In this paper, we present a novel hardware-software co-designed solution that is programmable and adaptable to runtime situations offering high-throughputs that can easily match current link-speeds. The key to our design is orthogonalization of memory lookups from traffic measurements through our query-driven measurement scheme. We have prototyped our approach on a Xilinx platform using Microblaze soft-core processors integrated with Virtex-II Pro FPGA fabric. We demonstrate the scalability of our architecture and also compare it with a recent offline (non real-time) sampling-based software alternative. The comparison shows that our architecture performs orders better in terms of speed and throughput even while being used as an offline solution.

acm multimedia | 2016

CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android

Seyyed Salar Latifi Oskouei; Hossein Bakhshi Golestani; Matin Hashemi; Soheil Ghiasi

Many mobile applications running on smartphones and wearable devices would potentially benefit from the accuracy and scalability of deep CNN-based machine learning algorithms. However, performance and energy consumption limitations make the execution of such computationally intensive algorithms on mobile devices prohibitive. We present a GPU-accelerated library, dubbed CNNdroid [1], for execution of trained deep CNNs on Android-based mobile devices. Empirical evaluations show that CNNdroid achieves up to 60X speedup and 130X energy saving on current mobile devices. The CNNdroid open source library is available for download at https://github.com/ENCP/CNNdroid

The Journal of Supercomputing | 2004

Collaborative and reconfigurable object tracking

Soheil Ghiasi; Hyun Jin Moon; Ani Nahapetian; Majid Sarrafzadeh

Many Applications perceive visual information through networks of embedded sensors. Intensive image processing computations have to be performed in order to process the perceived information. Such computations usually demand hardware implementations in order to exhibit real time performance. Furthermore, many of such applications are hard to be characterized a priori, since they take different paths according to events happening in the scene at runtime. Hence, reconfigurable hardware devices are the only viable platform for implementing such applications, providing both real time performance and dynamic adaptability for the system.In this paper, we present a collaborative and dynamically adaptive object tracking system that has been built in our lab. We exploit reconfigurable hardware devices embedded in a number of networked cameras in order to achieve our goal. We justify the need for dynamic adaptation of the system through scenarios and applications. Experimental results on a set of scenes advocate the fact that our system works effectively for different scenario of events through reconfiguration. Comparing results with non-adaptive implementations verify the fact that our approach improves systems robustness to scene variations and outperforms the traditional implementations.

international symposium on physical design | 2004

Innovate or perish: FPGA physical design

Taraneh Taghavi; Soheil Ghiasi; Abhishek Ranjan; Salil Raje; Majid Sarrafzadeh

The recent past has seen a tremendous increase in the size of design circuits that can be implemented in a single FPGA. The size and complexity of modern FPGAs has far outpaced the innovations in FPGA physical design. The problems faced by FPGA designers are similar in nature to those that preoccupy ASIC designers, namely, interconnect delays and design management. However, this paper will show that a simple re-targeting of ASIC physical design methodologies and algorithms to the FPGA domain will not suffice. We will show that several well researched problems in the ASIC world need new problem formulations and algorithms research to be useful for todays FPGAs. Partitioning, floorplanning, placement, delay estimation schemes are only some of the topics that need complete overhaul. We will give problem formulations, motivated by experimental results, for some of these topics as applicable in the FPGA domain.

Explore More