Is this you? Create Your Porfile

Amir Fijany

Istituto Italiano di Tecnologia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amir Fijany is active.

Explore More

Publication

Featured researches published by Amir Fijany.

ieee aerospace conference | 2011

Image processing applications on a low power highly parallel SIMD architecture

Amir Fijany; Fouzhan Hosseini

In this paper, we present and discuss high performance implementation of a wide class of image processing applications on a low-power massively parallel SIMD architecture, the ClearSpeed CSX700. We present parallel implementation results for four classes of image processing applications: feature detection (Harris Corner Detector), stereo vision (a class of SSD like algorithms), model estimation (RANSAC), and object detection (based on Histogram of Oriented Gradient, HOG) on the CSX SIMD architecture. Our results indicate that this SIMD architecture is indeed a good candidate for achieving low-power supercomputing capability, as well as a rather satisfactory degree of flexibility for implementing various applications. We also compare our results, when applicable, with similar implementations on ASIC, FPGAs, and GPGPUs. This comparison cealrly demonstrates that we achieve a much better absolute computational performance than ASICs and FPGAs, with a better relative performance per watt. Compared with GPGPUs, we achieve similar (and for some cases better) computational performance but with a significantly better relative performance per watt. We show that, by designing appropriate efficient parallel algorithms, this highly parallel SIMD architecture can represent an excellent candidate for space-borne applications wherein low-power, light weight, high performance computation is a major requirement.

ieee aerospace conference | 2009

A new efficient method for system structural analysis and generating Analytical Redundancy Relations

Amir Fijany; Farrokh Vatan

In this paper we present a new efficient algorithmic method for generating the Analytical Redundancy Relations (ARRs). ARRs are one of the crucial tools for model-based diagnosis as well as for optimizing, analyzing, and validating the system of sensors. However, despite the importance of the ARRs for both system diagnosis and sensor optimization, it seems that less attention has been paid to the development of systematic and efficient approaches for their generation. In this paper we discuss the complexity in derivation of ARRs and present a new efficient algorithm for their derivation. Given a system with a set of L ARRs, our algorithm achieves a complexity of O(L4) for generating the ARRs. To our knowledge, this is the first algorithm with a polynomial complexity for derivation of ARRs. We also present the results of application of our algorithms, for generating the complete set of ARRs, to both synthetic and industrial examples.

european conference on parallel processing | 2010

Highly parallel implementation of Harris Corner detector on CSX SIMD architecture

Fouzhan Hosseini; Amir Fijany; Jean-Guy Fontaine

We present a much faster than real-time implementation of Harris Corner Detector (HCD) on a low-power, highly parallel, SIMD architecture, the ClearSpeed CSX700, with application for mobile robots and humanoids. HCD is a popular feature detector due to its invariance to rotation, scale, illumination variation and image noises. We have developed strategies for efficient parallel implementation of HCD on CSX700, and achieved a performance of 465 frames per second (fps) for images of 640×480 resolution and 142 fps for 1280×720 resolution. For a typical real-time application with 30 fps, our fast implementation represents a very small fraction (less than %10) of available time for each frame and thus allowing enough time for performing other computations. Our results indicate that the CSX architecture is indeed a good candidate for achieving low-power supercomputing capability, as well as flexibility.

ieee aerospace conference | 2012

Highly parallel and fast implementation of stereo vision algorithms on MIMD many-core Tilera architecture

Saeed Safari; Amir Fijany; Francesco Diotalevi; Fouzhan Hosseini

In this paper we present a fast, and for some cases faster than real-time, implementation of a class of dense stereo vision algorithms including the sum of squared differences (SSD), SSD with left-right check, and SSD with multiple windows, on a low-power MIMD many-core architecture, Tilera. Stereo vision - a method to extract spatial depth information of a scene from two pairs of stereo images - is performed as a primary task and first step in many computer vision applications, e.g. 3D modeling and obstacle detection/avoidance in autonomous vehicles. To reduce the scene conditions in real environment and achieve a robust error rejection, intensive computation for implementing a multiple window with left-right checking scheme is required. Therefore, real-time implementation of these algorithms is a challenging problem, particularly in an embedded application. To the best of our knowledge, our results present the first implementation of any stereo vision algorithm on new emerging MIMD many-core architectures. We have achieved a faster than real-time performance of 207, 118, and 30.45 frames per second for VGA (640×480) images with a disparity range of 16 for standard SSD, SSD with left-right checking, and SSD with 5 multiple window implementations, respectively. For HDTV (1280×720) images, we have achieved rather unique results of 71, and 35.75 frames per second for standard SSD and SSD with left-right checking implementations, respectively. Such excellent performance along with the low power consumption of the Tilera architecture (less than 23W) makes it an excellent candidate to achieve a supercomputing level capability for mobile computer vision applications. Experimental results also clearly demonstrate that the new many-core MIMD parallel architectures can indeed achieve excellent performance in low-level image processing computations while providing a high degree of flexibility and programmability.

international symposium on visual computing | 2009

Real-Time Parallel Implementation of SSD Stereo Vision Algorithm on CSX SIMD Architecture

Fouzhan Hosseini; Amir Fijany; Saeed Safari; Ryad Chellali; Jean-Guy Fontaine

We present a faster than real-time parallel implementation of standard sum of squared differences (SSD) stereo vision algorithm, on an SIMD architecture, the CSX700. To our knowledge, this is the first highly parallel implementation of this algorithm using 192 processing elements. For disparity range of 16 pixels, we have achieved the rate of 160 and 59 stereo pairs per second on 640x480 and 1280x720 images, respectively. Since this implementation is much faster than real time, it leaves enough time for performing other machine vision applications in real time. Our results demonstrate that CSX architecture is a powerful processor for (low level) computer vision applications. Due to the low-power consumption of CSX architecture, it can be a good candidate for mobile computer vision applications.

ieee aerospace conference | 2010

A novel method for derivation of Minimal Set of Analytical Redundancy Relations for system diagnosis

Amir Fijany; Farrokh Vatan

We present a novel concept of Minimal Set of Analytical Redundancy Relation (ARRs) and an efficient method for its calculation for application to system diagnosis. ARRs are one of the crucial tools for model-based diagnosis as well as for optimizing, analyzing, and validating the system of sensors. However, despite the importance of the ARRs for system diagnosis, it seems that less attention has been paid to their efficient application. In this paper, we first discuss the complexity of model-based diagnosis by using ARRs. We then present the concept of Minimal Set of ARRs which enables a faster system diagnosis by significantly reducing the number of ARRs to be evaluated for diagnosis purpose. We then show that the derivation of minimal set of ARRs can be mapped as a 0–1 Integer Programming problem and present an efficient branch-and-bound algorithm for this derivation. We also present the results of application of our method for generating the minimal set of ARRs, to both synthetic and industrial examples, to show the significant reduction in the computational cost that can be achieved for system diagnosis.1 2

ieee international conference on space mission challenges for information technology | 2009

A Novel Efficient Method for Conflicts Set Generation for Model-Based Diagnosis

Amir Fijany; F. Vatan; Anthony Barrett

In this paper we present a new efficient algorithmic method for generating the conflicts set for model based diagnosis. Our new method combines the strength of the two different approaches proposed in the literature, that is, the fault detection and isolation (FDI), which is based on automatic control theory and statistical decision theory, and the other one, known as DX, which is based on artificial intelligence techniques. The first building block in our method is a new efficient algorithm for generation of the complete set of analytical redundancy relations (ARRs) for the system in an implicit form. For the diagnosis, our method first performs (similar to DX approaches) a system simulation to calculate the expected values of the measurements. Any discrepancy, i.e., the difference between expected and actual value of measurement, would trigger our diagnosis process. To this end, only those ARRs which involve the measurement with discrepancy are checked for consistency which lead a to a significant reduction in the number of consistency checks usually performed by DX approaches. We demonstrate the efficiency of our new method by its application to several synthetic systems and compare it with that of GDE.

ieee aerospace conference | 2012

A cooperative search algorithm for highly parallel implementation of RANSAC for model estimation on Tilera MIMD architecture

Amir Fijany; Francesco Diotalevi

In this paper, we present a novel and fast algorithm for highly parallel implementation of the RANSAC on a many-core MIMD architecture, the Tilera. RANSAC is widely used in image processing applications for homography model estimation. It also represents one of the most computation intensive image processing tasks since it requires evaluation of a large number of models from a given data set. Therefore, increasing the efficiency in its computation by exploiting a massive degree of parallelism is the key enabling factor for many of its applications. Emerging highly parallel architectures such as Tilera provide such an opportunity of exploiting parallelism in many computations. In addition to its low power consumption and excellent GOPs per Watt performance, radiation-hard version of Tilera has also been developed which makes it one of the best candidates for future aerospace applications. In this paper, we first present a novel variant of the RANSAC by incorporating the concept of backtracking. We then present this variant as a cooperative search algorithm with excellent features for highly parallel implementation. In fact, our parallel implementation results in an asynchronous algorithm with a very limited communication requirement. Any processor performs a global broadcasting if and when it finds a partial solution better than previous one. We present our results for an extensive set of data with varying degree of outliers. Our practical results clearly demonstrate that excellent speedup in the computation is achieved by using 57 cores of the Tilera. In fact, for certain cases, our Cooperative Search Algorithms even achieve super-linear speedup, i.e., a speedup greater than 57. We discuss that such a result could have been indeed expected and can be used for other applications.

ieee aerospace conference | 2011

Very low power parallel implementation of stereo vision algorithm on a solar cell powered MIMD many core architecture

Francesco Diotalevi; Amir Fijany; Michael Montvelishsky; Jean-Guy Fontaine

We present wavefront/systolic algorithms for efficient implementation of Stereo Vision (SV) computation on a novel and very low power many core MIMD architecture, the IntellaSys S40C18. 12For Sum of Squared Differences (SSD) and Sum of Absolute Differences (SAD) SV algorithms with a disparity range of 16 pixels, we have achieved a performance of up to 25 frames per second (fps) for 348×288 images while consuming only 75mW. To our knowledge, this seems to be one of the best performances in terms of fps per watt results for the SV computation. We have also developed and implemented a simple Obstacle Avoidance (OA) algorithm based on the resulting depth map by the SV computation. We have achieved a performance of 21 steering maneuvers per second while consuming 72mW of power. This very limited power consumption indeed enables the use of solar cells as the main source of power for the computing architecture. Such a high performance and low power computing system could enable new capabilities for many aerospace applications and encourage investigations for space qualification of the architecture.

international symposium on visual computing | 2010

Fast parallel model estimation on the cell broadband engine

Ali Khalili; Amir Fijany; Fouzhan Hosseini; Saeed Safari; Jean-Guy Fontaine

In this paper, we present fast parallel implementations of the RANSAC algorithm on the Cell processor, a multicore SIMD architecture. We present our developed strategies for efficient parallel implementation of the RANSAC algorithm by exploiting the specific features of the Cell processor. We also discuss our new method for model generation to increase the efficiency of calculation of the Homography transformation by RANSAC. In fact, by using this new method and change of algorithm, we have been able to increase the overall performance by a factor of almost 3. We also discuss in details our approaches for further increasing the efficiency by a careful vectorization of the computation as well as by reducing the communication overhead by overlapping computation and communication. The results of our practical implementations clearly demonstrate that a very high sustained computational performance (in terms of sustained GFLOPS) can be achieved with a minimum of communication overhead, resulting in a capability of real-time generation and evaluation of a very large number of models. With a date set of size 2048 data and a number of 256 models, we have achieved the performance of over 80 sustained GFLOPS. Since the peak computing power of our target architecture is 179 GFLOPS, this represents a sustained performance of about 44% of the peak power, indicating the efficiency of our algorithms and implementations. Our results clearly demonstrate the advantages of parallel implementation of RANSAC on MIMD-SIMD architectures such as Cell processor. They also prove that, by using such a parallel implementation over the sequential one, a problem with a fixed number of iterations (hypothetical models) can be solved much faster leading to a potentially better accuracy of the model.

Explore More