Noureddine Chabini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Noureddine Chabini is active.

Explore More

Publication

Featured researches published by Noureddine Chabini.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2003

Methods for minimizing dynamic power consumption in synchronous designs with multiple supply voltages

Noureddine Chabini; Ismail Chabini; El Mostapha Aboulhamid; Yvon Savaria

We address the problem of minimizing dynamic power consumption under performance constraints by scaling down the supply voltage of computational elements off critical paths. We assume that the number of possible supply voltages and their values are known for each computational element. We focus on solving this problem on cyclic and acyclic graphs corresponding to synchronous designs. We consider multiphase clocked sequential circuits derived using software pipelining techniques. In this paper, we present exact and heuristic methods to solve the problem. The proposed methods take the form of mathematical programming formulations and their associated solution algorithms. The exact methods are based on a mixed integer linear programming formulation of the problem. The heuristic methods are based on linear programming formulations derived from the exact problem formulation. Solution methods are analyzed experimentally in terms of their run time and effectiveness in finding designs with lower dynamic power using circuits from the ISCAS89 benchmark suite. Power reduction factors as high as 69.75% were obtained compared to designs using the highest supply voltages. One of the heuristic methods leads to solutions that are near optimal, typically within 5% from the optimal solution. Low dynamic-power designs with no or a small number of level converters, are also obtained.

IEEE Transactions on Very Large Scale Integration Systems | 2005

Unification of scheduling, binding, and retiming to reduce power consumption under timings and resources constraints

Noureddine Chabini; Wayne H. Wolf

Scheduling and binding are two tasks found in high-level synthesis of hardware as well as in compiling software. These tasks are realized on graphs that are models of the hardware or of the software to be compiled to run on a specific processor. Scheduling focuses on determining the start execution time of each node in the graph. Binding is the task of assigning each node in the graph to a specific computational element. Realize binding before or after scheduling can exclude generating high-quality designs (hardware or binary code). The latter statement is true in particular in the era of design for low power. Do not combine scheduling and binding can lead to designs with high switching activities and hence to high power consumption. To the best of our knowledge, there is no approach at this moment that addresses the problem of unifying scheduling and binding with an exact algorithm to produce designs with reduced power consumption. Known approaches to that problem are heuristics. That problem is NP-hard in general, since it is the composition of two NP-hard problems. Also, it has not yet been formulated in the literature. The problem becomes more complex when one has to deal with cyclic graphs and/or there are constraints to be met such as timings. For cyclic graphs, one has to integrate retiming in the unification of scheduling and binding. We propose a mathematical formulation to that problem. We extend this formulation to solve the problem of combining modulo scheduling, binding, and retiming under timings and resources constraints while reducing power consumption due to switching activities. The proposed approach is tested using known benchmarks. Based on obtained numerical results, this approach is able to reduce power consumption by 33.24% on average, with an average of 33.83 s as a run time.

great lakes symposium on vlsi | 2003

Unification of basic retiming and supply voltage scaling to minimize dynamic power consumption for synchronous digital designs

Noureddine Chabini; Ismail Chabini; El Mostapha Aboulhamid; Yvon Savaria

We address the problem of minimizing dynamic power consumption for single-phase synchronous digital designs, under timing constraints, using an unification of basic retiming and supply voltage scaling. We assume that the number of supply voltages and their values are known for each computation element. Our main objective is then to change the location of registers using basic retiming while maximizing the number of computation elements off critical paths that can operate under a low available supply voltage, and can lead to a maximum dynamic power saving. We address the problem at the system-level. We formulate the problem as a Mixed Integer Linear Program (MILP). The exact optimal solution for the problem is then guaranteed. We also devise an algorithm to compute bounds on the values assigned by basic retiming to each computational element. Besides helping to find the optimal solution to the problem, these bounds also allow to reduce the run-time for finding this solution. The proposed approach can produce converter-free designs and can also minimize short-circuit power consumption. Experimental results have shown that dynamic power consumption can be reduced by factors that range from 2.78% to 37.24% for single-phase designs with minimal clock period. For these experimental results, the run-time for solving the MILP is under 2min.

2011 Faible Tension Faible Consommation (FTFC) | 2011

Low power and fast DCT architecture using multiplier-less method

M. El Aakif; Said Belkouch; Noureddine Chabini; Moha M'rabet Hassani

In this paper, a low power and fast DCT (Discrete Cosine Transform) using multiplier-less method is presented with a new modified FGA (Flow-Graph Algorithm), which is derived from our previously presented FGA of DCT based on Loeffler algorithm. The multiplier-less method is based on the replacement of multiplications with a minimum number of additions and shifts. The proposed FGA is performed and compared to a previous one. The results of FPGA implementations on Altera Cyclone II show the increase of the maximum frequency, the decrease of the resources usage and the reduction of the dynamic power by 7.2 % at 120 MHz of clock frequency with a new proposed FGA algorithm. Another comparison with recent published results has been done and proves the efficiency of the proposed FGA.

Iet Computers and Digital Techniques | 2007

Optimised realisations of large integer multipliers and squarers using embedded blocks

Shuli Gao; Noureddine Chabini; Dhamin Al-Khalili; J. M. Pierre Langlois

An efficient design methodology and a systematic approach for the implementation of multiplication and squaring functions for unsigned large integers, using small-size embedded multipliers are presented. A general architecture of the multiplier and squarer is proposed and a set of equations is derived to aid in the realisation. The inputs of the multiplier and squarer are split into several segments leading to an efficient utilisation of the small-size embedded multipliers and a reduced number of required addition operations. Various benchmarks were tested for different segments ranging from 2 to 5 targeting Xilinx Spartan-3 FPGAs. The synthesis was performed with the aid of the Xilinx ISE 7.1 XST tool. The approach was compared with the traditional technique using the same tool. The results illustrate that the design approach is very efficient in terms of both timing and area savings. Combinational delay is reduced by an average of 7.71% for the multiplier and 21.73% for the squarer. In terms of 4-inputs look-up tables, area is lowered by an average of 11.63% for the multiplier and 52.22% for the squarer. In the case of the multiplier, both approaches use the same number of embedded multipliers. For the squarer, the proposed approach reduces the number of required embedded multipliers by an average of 32.77% compared with the traditional technique.

2007 IEEE Northeast Workshop on Circuits and Systems | 2007

Optimized realization of large-size two’s complement multipliers on FPGAs

Shuli Gao; Dhamin Al-Khalili; Noureddine Chabini

This paper presents an optimized design approach of twos complement large-size multipliers using embedded multipliers in FPGAs. The realization is based on Baugh-Wooleys algorithm. To achieve efficient implementation, a set of optimized schemes for the realization of the addition of partial products is proposed. The implementations of the multipliers have been carried out for operands with sizes from 20 to 128 bits. The results indicate that our proposed approach outperforms the traditional methods by as high as 50% in terms of LUT-delay product.

2006 IEEE North-East Workshop on Circuits and Systems | 2006

Efficient Realization of Large Integer Multipliers and Squarers

Shuli Gao; Noureddine Chabini; Dhamin Al-Khalili; Pierre Langlois

This paper presents an efficient design methodology and a systematic approach for the implementation of multiplication and squaring function for large integers using small-size embedded multipliers. A general architecture of the multiplier and squarer is proposed as well as a set of equations is derived to aid in the realization. The inputs of the multiplier and squarer are split into several segments leading to an efficient utilization of the small-size embedded multipliers and a reduced number of required addition operations. Various benchmarks were tested for different segments ranging from 2 to 4 targeting Xilinx Spartan-3 FPGA. The synthesis was performed with the aid of the Xilinx ISE 7.1 XST tool. Our approach was compared with the traditional technique using the same tool. The results illustrate that our design approach is very efficient in terms of both timing and area saving. The combinational delay is reduced by an average of 6.1% for the multiplier and 15.5% for the squarer. The area saving, in terms of number of 4-input LUTs, is about 8.3% for the multiplier and 50% for the squarer. In the case of the multiplier, both the approaches use the same number of embedded multipliers. For the squarer, our proposed approach has reduced the number of required embedded multipliers by an average of 30.5% compared to the traditional technique

international conference on microelectronics | 2009

Implementation of large size multipliers using ternary adders and higher order compressors

Shuli Gao; Dhamin Al-Khalili; Noureddine Chabini

Recent FPGA architectures facilitate the efficient mapping of high order compressors to implement multi-operand additions. This feature can be used to improve the performance and area utilization of large size multipliers. In this paper we present an improved design approach utilizing ternary adders and Generalized Parallel Compressors, GPCs, for the addition of the partial products. Multipliers of different sizes ranging from 80 bits to 170 bits were implemented on Alteras Stratix III devices. The results of our proposed scheme are compared to the standard ripple-adder-based multipliers. On average, a delay reduction of 17.7% and area saving of 56.53% were achieved when using ternary adders. Using the GPCs with one level ternary adder, the average delay reduction is 18.7% and the average area saving is 24.1%.

International Journal of Reconfigurable Computing | 2009

Efficient scheme for implementing large size signed multipliers using multigranular embedded DSP blocks in FPGAs

Shuli Gao; Dhamin Al-Khalili; Noureddine Chabini

Modern FPGAs contain embedded DSP blocks, which can be configured as multipliers with more than one possible size. FPGA-based designs using these multigranular embedded blocks become more challenging when high speed and reduced area utilization are required. This paper proposes an efficient design methodology for implementing large size signed multipliers using multigranular small embedded blocks. The proposed approach has been implemented and tested targeting Alteras Stratix II FPGAs with the aid of the Quartus II software tool. The implementations of the multipliers have been carried out for operands with sizes ranging from 40 to 256 bits. Experimental results demonstrated that our design approach has outperformed the standard scheme used by Quartus II tool in terms of speed and area. On average, the delay reduction is about 20.7% and the area saving, in terms of ALUTs, is about 67.6%.

international conference on multimedia computing and systems | 2011

FPGA implementation of a pipelined 2D-DCT and simplified quantization for real-time applications

Hatim Anas; Said Belkouch; M. El Aakif; Noureddine Chabini

The Discrete Cosine Transform (DCT) is one of the most widely used techniques for image compression. Several algorithms are proposed to implement the DCT-2D. The scaled SDCT algorithm is an optimization of the DCT-1D, which consists in gathering all the multiplications at the end. In this paper, in addition to the hardware implementation on an FPGA, an extended optimization has been performed by merging the multiplications in the quantization block without having an impact on the image quality. Tests using MATLAB environment have shown that our proposed approach produces images with quality comparable to the ones obtained using the JPEG standard. FPGA-based implementations of this proposed approach and the Loefflers algorithm are proposed and compared in this paper using an Altera Startix FPGA family with the synthesis and implementation tool Quartus II. Results show that our approach outperforms the well known Loefflers algorithm in terms of processing-speed and resources used.

Explore More