Atef Ibrahim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Atef Ibrahim is active.

Explore More

Publication

Featured researches published by Atef Ibrahim.

IEEE Transactions on Parallel and Distributed Systems | 2011

Processor Array Architectures for Scalable Radix 4 Montgomery Modular Multiplication Algorithm

Atef Ibrahim; Fayez Gebali; Hamed Elsimary; Amin Nassar

This paper presents a systematic methodology for exploring possible processor arrays of scalable radix 4 modular Montgomery multiplication algorithm. In this methodology, the algorithm is first expressed as a regular iterative expression, then the algorithm data dependence graph and a suitable affine scheduling function are obtained. Four possible processor arrays are obtained and analyzed in terms of speed, area, and power consumption. To reduce power consumption, we applied low power techniques for reducing the glitches and the Expected Switching Activity (ESA) of high fan-out signals in our processor array architectures. The resulting processor arrays are compared to other efficient ones in terms of area, speed, and power consumption.

IEEE Transactions on Very Large Scale Integration Systems | 2015

Systolic Array Architectures for Sunar–Koç Optimal Normal Basis Type II Multiplier

Atef Ibrahim; Fayez Gebali; Turki F. Al-Somani

We present linear and nonlinear techniques for design exploration of an iterative algorithm. The nonlinear techniques allow control of processor workload and control of communication between processors. The algorithm considered is the Sunar-Koç optimal normal basis type II multiplication algorithm. Six systolic arrays are obtained. General formulas are provided for each design so that the operation of the system can be determined for a given GF(2m). The proposed architectures have been implemented using 45-nm CMOS technology and compared with published architectures. The results show that the proposed designs have at least 44.4% lower total computation time compared with the designs of all bit serial multipliers, while having slightly larger area delay product (ADP), up to 19.1%, compared with some of the bit serial multipliers and having smaller ADP values compared with most of the digit serial ones. Moreover, they have at least 46% lower power delay product compared with all bit serial and digit serial multipliers.

signal processing systems | 2016

Low Power Semi-systolic Architectures for Polynomial-Basis Multiplication over GF(2m) Using Progressive Multiplier Reduction

Atef Ibrahim; Fayez Gebali

We present low area and low power semi-systolic array architectures for polynomial basis multiplication over GF(2m) using Progressive Multiplier Reduction Technique (PMR). These architectures are explored using linear and nonlinear techniques applied to the polynomial multiplication algorithm. The nonlinear techniques allow the designer, to control the processor workload and reduce the inter-processor communications. The semi-systolic architectures obtained have simple structure with local communication. ASIC implementations of our designs and comparable published designs show that the proposed scalable semi-systolic structures have less area complexity (56.8–94.6 %) and power consumption (55.2–84.2 %) except for a scalable design published by the same authors. However, one of the proposed scalable designs outperforms this design in terms of throughput by 73.8 %. This makes the proposed designs suited to embedded applications that require low power consumption and moderate speed.

Microelectronics Journal | 2015

Optimized structures of hybrid ripple carry and hierarchical carry lookahead adders

Atef Ibrahim; Fayez Gebali

Abstract This paper proposes improved structures for fast adders that include carry lookahead (CLA) and hierarchical carry lookahead (HCLA). Also, it proposes optimized novel structures of hybrid ripple carry/hierarchical carry lookahead (RCA/HCLA) adders. A general methodology is presented for constructing M -bit hierarchical carry lookahead adders using n -bit modules. The only restriction on the values of M or n is n ≤ M . Two algorithms are developed to efficiently construct hierarchical carry lookahead adders for the case when M is not an integer power or an integer multiple of n . The improved hierarchical levels of carry lookahead adders are integrated with the ripple carry adder to construct the novel hybrid RCA/HCLA adders. Area and time complexities of the resulting designs are reported for different values of radix n and the practical values of 32 and 64 bits of M . An ASIC implementation of the proposed structures and previously published recent designs shows that one of the proposed hybrid RCA/HCAL adders achieves 28.2–77.7% reduction in area–delay product and 40.5–75.8% reduction in energy, for M =64 and n =8, over the different compared adder designs.

IEEE Transactions on Very Large Scale Integration Systems | 2015

Efficient Scalable Serial Multiplier Over GF(

Fayez Gebali; Atef Ibrahim

This brief presents a novel low-complexity scalable serial architecture for finite field multiplication over GF(2m) based on irreducible trinomial. This architecture was explored by applying nonlinear technique that allows the designer, using progressive product reduction technique, to control the workload per processor and also allows the communication overhead between processors to be reduced. By comparing the ASIC implementation of the proposed structure to some of the previously published structures, the proposed structure have at least 71.7% lower area and at least 89.9% lower power compared with most of them. This makes the proposed design more suitable for constrained implementations of cryptographic primitives in resource constrained applications, such as smart cards, handheld devices, and implantable medical devices.

Canadian Journal of Electrical and Computer Engineering-revue Canadienne De Genie Electrique Et Informatique | 2009

\textbf {2}^{\boldsymbol {m}}

Atef Ibrahim; Fayez Gebali; Hamed Elsimary; Amin Nassar

This paper presents a new processor array architecture for scalable radix 2 Montgomery modular multiplication algorithm. In this architecture, the multiplicand and the modulus words are allocated to each processing element rather than pipelined between the processing elements as in the previous architecture extracted by C. Koc. Also, the multiplier bits are fed serially to the first processing element of the processor array every odd clock cycle. By analyzing this architecture, we found that it has a better performance-in terms of area and speed-and lower power consumption than the previous architecture extracted by Ç. Koç.

pacific rim conference on communications, computers and signal processing | 2009

) Based on Trinomial

Atef Ibrahim; Fayez Gebali; Hamed Elsimary; Amin Nassar

This paper presents a new processor array architecture for scalable radix2 Montgomery modular multiplication algorithm. In this architecture, the multiplicand and the modulus words are allocated to each processing element rather than pipelined between the processing elements as in the previous architecture extracted by Ç . Koç, and also the multiplier bits are fed serially to the first processing element of the processor array every odd clock cycle. By analyzing this architecture, we found that it has a better performance - in terms of area and speed - than the previous architecture extracted by Ç. Koç.

Microprocessors and Microsystems | 2016

High-performance, low-power architecture for scalable radix 2 montgomery modular multiplication algorithm

Fayez Gebali; Atef Ibrahim

Propose low power bit-serial and digit-serial semi-systolic multiplier architectures over GF(2m).Develop a new Progressive Reduction Technique (PPR).Develop affine and nonlinear task scheduling functions.Develop affine and nonlinear task projection onto processors.Provide ASIC Implementation for proposed and previously published designs. This paper proposes a three bit-serial and digit-serial semi-systolic GF(2m) multipliers using Progressive Product Reduction (PPR) technique. These architectures are obtained by converting the GF(2m) multiplication algorithm into an iterative algorithm using systematic techniques for scheduling the computational tasks and mapping them to Processing Elements (PE). Three different semi systolic arrays were obtained. ASIC implementation of the proposed designs and previously published schemes were used to verify the performance of the proposed designs. One proposed design has at least 29% lower area compared to previously published bit/digit serial multipliers. This design has also at least 70% lower power compared to previously published bit/digit serial multipliers. Another proposed design has at least 12% lower power-delay product (PDP) compared to previously published bit/digit serial multipliers. This makes the proposed designs more suited to resource-constrained embedded applications.

IEEE Transactions on Parallel and Distributed Systems | 2017

New processor array architecture for scalable radix 2 Montgomery modular multiplication algorithm

Awos Kanan; Fayez Gebali; Atef Ibrahim

We present a systematic methodology for exploring the design space of similarity distance computation in machine learning algorithms. Previous architectures proposed in the literature have been obtained using ad hoc techniques that do not allow for design space exploration. The size and dimensionality of the input datasets have not been taken into consideration in previous works. This may result in impractical designs that are not amenable for hardware implementation. The methodology presented in this work is used to obtain the 3-D computation domain of the similarity distance computation algorithm. A scheduling function determines whether an algorithm variable is pipelined or broadcast. Four linear scheduling functions are presented, and six possible 2-D processor array architectures are obtained and classified based on the size and dimensionality of the input datasets. The obtained designs are analyzed in terms of speed and area, and compared with previously obtained designs. The proposed designs achieve better time and area complexities.

Computers & Electrical Engineering | 2017

Low space-complexity and low power semi-systolic multiplier architectures over GF(2m) based on irreducible trinomial

Atef Ibrahim; Turki F. Al-Somani; Fayez Gebali

Propose a regular technique for exploring the unified hardware structure.Develop a novel unified systolic array structure for the multiplication and inversion algorithms over GF (2m).Provide ASIC and FPGA implementations for the proposed and the previously published designs. Unified Systolic Array Structure for Multiplication and Inversion Over GF(2m). Display Omitted This paper proposes a new unified systolic array architecture to perform multiplication and inversion operations in GF(2m) based on the bit serial multiplication algorithm and the previously modified extended Euclidean algorithm. This architecture is explored by applying a regular technique to the multiplication and inversion algorithms. It has lower area and power complexities as well as it achieves a moderate speed. Also, it has a simple structure with processing elements have local communication with each other. The implementation results of the proposed design and the comparable published designs show that the proposed design saves more area (ranging from 18.8% to 23.0%) and saves more energy (ranging from 18.2% to 47.0%) over the compared efficient designs. This makes it more suitable for applications that impose more constraints on area and power consumption.

Explore More