Is this you? Create Your Porfile

Saleh Abdel-Hafeez

Jordan University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saleh Abdel-Hafeez is active.

Explore More

Publication

Featured researches published by Saleh Abdel-Hafeez.

IEEE Transactions on Very Large Scale Integration Systems | 2013

Scalable Digital CMOS Comparator Using a Parallel Prefix Tree

Saleh Abdel-Hafeez; Ann Gordon-Ross; Behrooz Parhami

We present a new comparator design featuring wide-range and high-speed operation using only conventional digital CMOS cells. Our comparator exploits a novel scalable parallel prefix structure that leverages the comparison outcome of the most significant bit, proceeding bitwise toward the least significant bit only when the compared bits are equal. This method reduces dynamic power dissipation by eliminating unnecessary transitions in a parallel prefix structure that generates the N-bit comparison result after (log4 N)+(log16N)+4 CMOS gate delays. Our comparator is composed of locally interconnected CMOS gates with a maximum fan-in and fan-out of five and four, respectively, independent of the comparator bitwidth. The main advantages of our design are high speed and power efficiency, maintained over a wide range. Additionally, our design uses a regular reconfigurable VLSI topology, which allows analytical derivation of the input-output delay as a function of bitwidth. HSPICE simulation for a 64-b comparator shows a worst case input-output delay of 0.86 ns and a maximum power dissipation of 7.7 mW using 0.15- μm TSMC technology at 1 GHz.

IEEE Transactions on Very Large Scale Integration Systems | 2011

A Digital CMOS Parallel Counter Architecture Based on State Look-Ahead Logic

Saleh Abdel-Hafeez; Ann Gordon-Ross

We present a high-speed wide-range parallel counter that achieves high operating frequencies through a novel pipeline partitioning methodology (a counting path and state look-ahead path), using only three simple repeated CMOS-logic module types: an initial module generates anticipated counting states for higher significant bit modules through the state look-ahead path, simple D-type flip-flops, and 2-bit counters. The state look-ahead path prepares the counting paths next counter state prior to the clock edge such that the clock edge triggers all modules simultaneously, thus concurrently updating the count state with a uniform delay at all counting path modules/stages with respect to the clock edge. The structure is scalable to arbitrary N-bit counter widths (2-to-2N range) using only the three module types and no fan-in or fan-out increase. The counters delay is comprised of the initial module access time (a simple 2-bit counting stage), one three-input and-gate delay, and a D-type flip-flop setup-hold time. We implemented our proposed counter using a 0.15-μ m TSMC digital cell library and verified maximum operating speeds of 2 and 1.8 GHz for 8- and 17-bit counters, respectively. Finally, the area of a sample 8-bit counter was 78 125 μ m2 (510 transistors) and consumed 13.89 mW at 2 GHz.

international conference on signal processing | 2007

High Performance AES Design using Pipelining Structure over GF((2 4 ) 2 )

Saleh Abdel-Hafeez; Ahmed Sawalmeh; Sameer M. Bataineh

High data throughput AES hardware architecture is proposed by partitioning the ten rounds into sub-blocks of repeated AES modules. The blocks are separated by intermediate buffers providing a complete ten stages of AES pipeline structure. Furthermore, the AES is internally evenly divided to ten pipeline stages; with the addition feature that the shift rows block (ShiftRow) is structured to operate before the byte substitute (ByteSubstitute) block. The use of this swapping operation has no effect on the AES encryption algorithm; however, it streamlines the process of four blocks of data in parallel rather than 16 blocks which is considered the key advantage for area saving. We evaluate the performance of our new implementation and current implementations in terms of throughput rate and hardware area for ALTERA MAX3000A family FPGA EMP3128ATC100-5. The simulation results show that the proposed AES has higher throughput rate of about 16% than the general AES pipeline structure with a saving hardware area of 36%.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2006

A VLSI High-Performance Priority Encoder Using Standard CMOS Library

Saleh Abdel-Hafeez; Shadi M. Harb

A novel high-performance priority encoder design using standard CMOS library cell is proposed. The new encoder design implementation accommodates both high- and low-priority functionalities with scalable design structure through a special prefixing scheme. The prefixing scheme is applied to minimize the entire propagation delay and exploit the shared hardware between the high- and low-priority evaluation logics circuitry. The proposed encoder shows significant improvement in terms of speed, robustness for top-level floor plan routing, and modularity with pattern structure in compared to the existing encoder designs. Simulation results are conducted for different encoder inputs through 0.15-mum TSMC CMOS technology, where 32-bit priority encoder is used as a test vehicle for comparison improvement measurements. The expected results show that the 32-bit encoder is operating at a maximum of 667-MHz operating frequency with total count of 1106 transistors and a maximum power consumption of total 13.8 mW

international symposium on circuits and systems | 2008

High speed digital CMOS divide-by-N fequency divider

Saleh Abdel-Hafeez; Shadi M. Harb; William R. Eisenstadt

A high-speed scalable programmable divide-by-N frequency divider is presented. The divider includes a new proposed state look-ahead parallel counter with a basic conventional D-type Flip-Flop (DFF) circuit. The counter is structured from two modules of 2-bit counter stages separated by DFF buffers, where all are triggered at the edge of the input clock. The reload circuit is a single DFF buffer, while the detecting count circuit is constructed from a two level decoder. The M-bit divider critical path delay, which is independent of technology, is approximated to [3.5 + Log4 (M)] of a unit delay close to a 2-input NAND gate. This results in a measured frequency, which slightly drops to about 6% against the increase of the divider bit size. Furthermore, the divider circuit is attractive for continued technology scaling since the architecture is based on using identical modules of small count of CMOS transistors with only threshold voltage technology limitations. The measure rate of the number of transistors is approximated to a linear increase of about 17% per a two-bit increase of the divider size. The presented 8-bit programmable divide-by-N frequency divider is capable of operating up to 2 GHz for a 1.35 V power supply voltage with a maximum power consumption of 16.78 mW and a maximum frequency divider factor of N=256 using the TSMC 0.15 mum digital CMOS process, and gives a measured area of 95*143 mum2 with a total count of 508 transistors.

IEEE Transactions on Very Large Scale Integration Systems | 2017

An Efficient O(

Saleh Abdel-Hafeez; Ann Gordon-Ross

In this paper, we propose a novel sorting algorithm that sorts input data integer elements on-the-fly without any comparison operations between the data—comparison-free sorting. We present a complete hardware structure, associated timing diagrams, and a formal mathematical proof, which show an overall sorting time, in terms of clock cycles, that is linearly proportional to the number of inputs, giving a speed complexity on the order of O(N). Our hardware-based sorting algorithm precludes the need for SRAM-based memory or complex circuitry, such as pipelining structures, but rather uses simple registers to hold the binary elements and the elements’ associated number of occurrences in the input set, and uses matrix-mapping operations to perform the sorting process. Thus, the total transistor count complexity is on the order of O(N). We evaluate an application-specified integrated circuit design of our sorting algorithm for a sample sorting of N = 1024 elements of size K = 10-bit using 90-nm Taiwan Semiconductor Manufacturing Company (TSMC) technology with a 1 V power supply. Results verify that our sorting requires approximately 4–

Journal of Circuits, Systems, and Computers | 2008

N

Saleh Abdel-Hafeez; Anas S. Matalkah

6~\mu \text{s}

power and timing modeling, optimization and simulation | 2007

) Comparison-Free Sorting Algorithm

Saleh Abdel-Hafeez; Shadi M. Harb; William R. Eisenstadt

to sort the 1024 elements with a clock cycle time of 0.5 GHz, consumes 1.6 mW of power, and has a total transistor count of less than 750 000.

IEEE Transactions on Very Large Scale Integration Systems | 2007

CMOS EIGHT-TRANSISTOR MEMORY CELL FOR LOW-DYNAMIC-POWER HIGH-SPEED EMBEDDED SRAM

Saleh Abdel-Hafeez; Shadi M. Harb; William R. Eisenstadt

Embedded SRAM design with high noise margin between read and write, low power, low supply voltages, and high speed become essential features in VLSI embedded applications. The complete embedded SRAM design of self-timing synchronization is proposed based on the CMOS eight-transistor (8T-Cell) memory cell circuit. The cell is based on the traditional six-transistor (6T-Cell) cross-coupled invertors with the addition of two NMOS transistors for separate read buffer circuit. The read buffer structure is based on pre-charging the read bit-line during the low value of read clock and evaluating the read bit-line during the high value of read clock, thereby maintaining one active line per column and eliminating the use of traditional sense amplifier with all its synchronization schemes. The simulation results show that the embedded SRAM of size 128-bit × 128-bit is operating at a maximum frequency of 200 MHz for Write and Read clock cycles with 1.62 V power supply, and measures a total average power consumption of 22.60 mW. All simulation results were conducted on 0.18 μm TSMC single poly and three layers of metals measuring a cell area of 2.2 × 3.0 μm2. The circuit is not meant to replace the SRAM with 6T-Cell transistor structure; however, it is attractive for applications related to high density with automation road-map design, such as graphic and network processor chips. In these applications, memory sizes are introduced in many different irregular geometries and uses all over the chip with storage sizes less than 20 k-bit, in addition, it is susceptible to large substrate noise as well as large coupling wire routing.

The Journal of Supercomputing | 2018

Low-power content addressable memory with read/write and matched mask ports

Saleh Abdel-Hafeez; Ann Gordon-Ross; Samer Abubaker

A low-power content addressable memory (CAM) with read/write and mask match ports is proposed. The CAM cell is based on the conventional 6T cross-coupled inverters used for storing data with an addition of two NMOS transistors for reading out. In addition, the CAM has another four transistors for mask comparison operation through classical pre-charge operation. The readout port exploits a pre-charge reading mechanism in order to alleviate the drawback of power consumption generated from sensing amplifiers and all other related synchronization circuits which are structured in every column in the memory. Thus, the read and match features can have concurrent operations. An experimental CAM structure of storage size 64-bit × 128-bit is designed using 0.18-µm CMOS single poly and three layers of metals measuring a cell die area of 24.4375 µm2 and a total silicon area of 0.269192 mm2. The circuit works up to 200 MHz in simulation with total power consumption of 0.016 W at 1.8-V supply voltage.

Explore More