Mohamed Nekili
Concordia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohamed Nekili.
IEEE Transactions on Circuits and Systems | 2008
Falah Awwad; Mohamed Nekili; Mohamad Sawan
Parallel repeaters are proven to outperform serial repeaters in terms of delay, power and silicon area when regenerating signals in system-on-chip (SoC) interconnects. In order to avoid fundamental weaknesses associated with previously published parallel repeater-insertion models, this paper presents a new mathematical modeling for parallel repeater-insertion methodologies in SoC interconnects. The proposed methodology is based on modeling the repeater pull-down resistance in parallel with the interconnect. Also, to account for the effect of interconnect inductance, two moments were used in the transfer function, as opposed to previous Elmore delay models which consider only one moment for RC interconnects. A direct consequence of this new type of modeling is an increased challenge in the mathematical modeling of interconnects. HSpice electrical and C++/MATLAB simulations are conducted to assess the performance of the proposed optimization methodology using a 0.25-mum CMOS technology. Simulation results show that this repeater-insertion methodology can be used to optimize SoC interconnects in terms of propagation delay, and provide VLSI/SoC designers with optimal design parameters, such as the type as well as the position and size of repeaters to be used for interconnect regeneration, faster than with conventional HSpice simulations.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2005
Haydar Saaied; Dhamin Al-Khalili; Asim J. Al-Khalili; Mohamed Nekili
The need for incremental algorithms to implement engineering changes (ECs) in clock trees (CTs) is critical in the system-on-a-chip (SoC) design cycle. An algorithm, called adaptive wire adjustment (AWA), is proposed to minimize the clock skew iteratively to any given bound. In order to speed up AWAs convergence, a local topology-modification (LTM) technique is incorporated into AWA. Moreover, LTM incorporation into AWA results in total wire-length reduction as well. Also, the incorporation of the LTM technique into the deferred-merge embedding (DME) algorithm and Greedy-DME (GDME) helps reduce the total wire length by around 7.8% and 9.8%, respectively. Additionally, applying LTM to GDME reduces wire elongations and the standard deviation of the path lengths (SDPL) between clock pins by 96.4% and 51.5%, respectively.
The 2nd Annual IEEE Northeast Workshop on Circuits and Systems, 2004. NEWCAS 2004. | 2004
Syed Rafay Hasan; Alexandre Landry; Yvon Savaria; Mohamed Nekili
Hypertransport (HT) is an emerging system integration communication technology. There is a need for a network on chip (NOC) technology compatible with this cutting-edge system technology. This paper reveals design constraints involved in implementing an HT-compatible NOC. In order to provide a simple and HT-compatible solution, we propose an architecture called hypertransport super lite (HTSL). The new architecture allows a reduction of more than 14 times in required buffer space, while keeping the functionality unaffected. Moreover, exploiting the advantages of on-chip architecture leverages the processing complexity of each node in the architecture.
international symposium on circuits and systems | 2005
Alexandre Landry; Mohamed Nekili; Yvon Savaria
The paper proposes a novel multi-layer AMBA high-speed bus (AHB) infrastructure designed to sustain a clock frequency of more than 2 GHz, which remarkably provides up to 4 giga data transfers per second of throughput. The interconnect matrix is achieved through a collection of high-performance bridges that serialize transfers toward a high-throughput shared-memory. As a result, we guarantee a maximum of one cycle communication latency to sixteen 125 MHz processors connected to our infrastructure. The proposed solution has been designed and verified with Cadence tools using a 0.18 /spl mu/m CMOS technology.
asia and south pacific design automation conference | 2003
Haydar Saaied; Dhamin Al-Khalili; Asim J. Al-Khalili; Mohamed Nekili
In this paper, we suggest an adaptive approach for the Clock Distribution Network (CDN) to cope with a modification in the VLSI system design. The CDNs wires are adjusted iteratively to reduce the skew that is resulting from a minor modification in the clock pins of a complex VLSI system. Such skew can be remedied by selecting a Balancing Node (BN) and adjust its edges so that the skew gets smaller. The required edge adjustments are determined using the Elmore delay model. The performance of the algorithm is investigated using different random sets of clock pins. Also, the algorithm is tested by altering some clock pins in a zero skew CDN. For small modifications in a large number of nodes in the CDN, our algorithm can achieve zero skew with less iterations than linear order algorithms.
international conference on microelectronics | 2004
Alexandre Landry; Y. Savaria; Mohamed Nekili
This paper presents an on-chip interconnection infrastructure based on ARMs AHB standard to obtain a bus working beyond one gigahertz. All major design blocks necessary to implement reliable interconnect infrastructures for DSP platforms are presented. This interconnect infrastructure is implemented as a hard IP module to get the maximum performance out of TSMCs 0.18 /spl mu/m CMOS technology. As a result, a bus operating at 1.4 GHz capable of transferring 2.8 giga data items per second was successfully designed.
international conference on microelectronics | 2001
Falah Awwad; Mohamed Nekili
On-chip inductance has become of significance in the design of high-speed interconnects. In this paper, three techniques are applied to regenerate an RLC interconnect in series, parallel and without regeneration. Simulations using a 0.25 /spl mu/m TSMC technology show that the parallel regeneration starts achieving a better speed than the non-regenerated line at wire lengths smaller than that achieved when the wire is serially regenerated. It also features 47% time delay saving and 96% area-delay product saving over the serial regeneration.
great lakes symposium on vlsi | 2004
Adhir Upadhyay; Syed Rafay Hasan; Mohamed Nekili
With ongoing advances of semiconductor technology, power dissipation has been moving higher on the list of VLSI design constraints. In most high-performance synchronous VLSI designs, the distribution of low-skew global clock signals approaching GigaHertz range is the single largest source of power consumption. GALS design style offers a solution to this issue by dividing synchronous design into smaller locally synchronous sub-blocks. Smaller sub-blocks reduce capacitance in clock distribution networks because they need less H-tree levels. However, this implies a large number of sub-blocks, which increases the asynchronous power overhead. This work investigates these GALS power tradeoffs. This is, to our knowledge, the first paper to propose closed form models for optimum number of partitions that gives minimum power for a GALS array of identical processors. The models can serve as a useful firsthand guideline for designers in initial design stages. Experimental results verify the effectiveness of the model.
The 2nd Annual IEEE Northeast Workshop on Circuits and Systems, 2004. NEWCAS 2004. | 2004
A. Upadhyay; Syed Rafay Hasan; Mohamed Nekili
Globally asynchronous locally synchronous (GALS) design style has evolved as a solution to increasing problems of distributing clock at high frequency in DSM technology. Most wrapper designs proposed in some recent literature are based on bundled data protocols and suffer from the same timing closure problem as synchronous designs. Delay insensitive (DI) protocols offer a solution to this problem. However, most of the work on DI schemes was limited to asynchronous circuits so far. This is, to our knowledge, the first paper that presents a complete asynchronous wrapper architecture for GALS designs based on a DI protocol. It uses 1-of-4 data encoding with single-track handshaking. The resulting circuit shows a throughput of 1.66 Gbps, significantly higher than previous asynchronous DI templates.
great lakes symposium on vlsi | 2002
Falah Awwad; Mohamed Nekili
Repeaters are now widely used to enhance the performance of long On-Chip interconnects in CMOS VLSI. For RC-modeled in¿terconnects, parallel repeaters have proved to be superior to serial ones. In this paper, a Variable-Segment Regeneration Technique is introduced and compared with a Variable-driver Parallel Tech¿nique, a recently proposed transparent repeater and with three con¿ventional techniques. HSpice Simulations using a 0.25 μm TSMC technology show that both the variable-segment and variable-driver techniques feature 62% time delay saving and 354% Area-Delay product saving over the transparent repeater, and are superior to all conventional techniques. However, our new variable-segment technique is characterized by a 116% Area-Delay product saving over the variable-driver technique. Thus, making it the most perfor¿mant in the field of high-performance RLC interconnect regenera¿tion. The simulation results confirm the superiority of the parallel regeneration technique over the serial ones.