Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hossam A. H. Fahmy is active.

Publication


Featured researches published by Hossam A. H. Fahmy.


Microelectronics Journal | 2013

Memristor-based memory

Mohammed Affan Zidan; Hossam A. H. Fahmy; Muhammad Mustafa Hussain; Khaled N. Salama

In this paper, we investigate the read operation of memristor-based memories. We analyze the sneak paths problem and provide a noise margin metric to compare the various solutions proposed in the literature. We also analyze the power consumption associated with these solutions. Moreover, we study the effect of the aspect ratio of the memory array on the sneak paths. Finally, we introduce a new technique for solving the sneak paths problem by gating the memory cell using a three-terminal memistor device.


asilomar conference on signals, systems and computers | 1999

Fast division algorithm with a small lookup table

Patrick Hung; Hossam A. H. Fahmy; Oskar Mencer; Michael J. Flynn

This paper presents a new division algorithm, which requires two multiplication operations and a single lookup in a small table. The division algorithm takes two steps. The table lookup and the first multiplication are processed concurrently in the first step, and the second multiplication is executed in the next step. This divider uses a single multiplier and a lookup table with 2/sup m/(2m+1) bits to produce 2 m-bit results that are guaranteed correct to one ulp. By using a multiplier and a 12.5 KB lookup table, the basic algorithm generates a 24-bit result in two cycles.


IEEE Transactions on Nanotechnology | 2014

Memristor Multiport Readout: A Closed-Form Solution for Sneak Paths

Mohammed Affan Zidan; Ahmed M. Eltawil; Fadi J. Kurdahi; Hossam A. H. Fahmy; Khaled N. Salama

In this paper, we introduce for the first time, a closed-form solution for the memristor-based memory sneak paths without using any gating elements. The introduced technique fully eliminates the effect of sneak paths by reading the stored data using multiple access points and evaluating a simple addition/subtraction on the different readings. The new method requires fewer reading steps compared to previously reported techniques, and has a very small impact on the memory density. To verify the underlying theory, the proposed system is simulated using Synopsys HSPICE showing the ability to achieve a 100% sneak-path error-free memory. In addition, the effect of quantization bits on the system performance is studied.


symposium on computer arithmetic | 2003

The case for a redundant format in floating point arithmetic

Hossam A. H. Fahmy; Michael J. Flynn

We use a partially redundant number system as an internal format for floating point arithmetic operations. The redundant number system enables carry free arithmetic operations to improve performance. Conversion from the proposed internal format back to the standard IEEE format is done only when an operand is written to memory. A detailed discussion of an adder using the proposed format is presented and the specific challenges of the design are explained. A brief description of a multiplier and divider using the proposed format is also presented. The proposed internal format and arithmetic units comply with all the rounding modes of the IEEE 754 floating point standard. Transistor simulation of the adder and multiplier confirm the performance advantage predicted by the analytical model.


international midwest symposium on circuits and systems | 2010

A decimal floating-point fused-multiply-add unit

Rodina Samy; Hossam A. H. Fahmy; Ramy Raafat; Amira Mohamed; Tarek Eldeeb; Yasmin Farouk

This paper presents the first hardware implementation of a fully parallel decimal floating-point fused-multiply-add unit performing the operation ± (A × B) ± C on decimal floating-point operands. The proposed design is fully compliant with the IEEE 754–2008 standard and supports the two standard formats decimal64 and decimal128. Furthermore, the proposed design may be controlled to perform the multiplication or the addition/subtraction as standalone operations. Our decimal floating-point FMA may be pipelined so that a complete resultant decimal floating-point is available each clock cycle.


asilomar conference on signals, systems and computers | 2008

A decimal fully parallel and pipelined floating point multiplier

Ramy Raafat; Amira M. Abdel-Majeed; Rodina Samy; Tarek Eldeeb; Yasmin Farouk; Mostafa Elkhouly; Hossam A. H. Fahmy

Decimal arithmetic is important in several commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents a fully parallel Decimal64 floating point (FP) multiplier compliant to IEEE Std 754-2008 for floating point arithmetic. The proposed multiplier possesses novel methods to target low latency. The proposed design is based on a previously published fixed point multiplier that uses a novel BCD-4221 recoding for decimal digits to improve the area and latency of the partial product generation and the partial product reduction tree. Several enhancements are introduced to the design; the final carry propagation adder is implemented using a fully parallel decimal adder with a Kogge-Stone prefix tree, the sticky bit is generated in parallel to the shifter to reduce the critical path delay. The design is extendable to support Decimal128 floating point multiplication. The multiplier is hardware verified for functionality on an FPGA.


Eurasip Journal on Embedded Systems | 2011

A precise high-level power consumption model for embedded systems software

Mostafa E. A. Ibrahim; Markus Rupp; Hossam A. H. Fahmy

The increasing demand for portable computing has elevated power consumption to be one of the most critical embedded systems design parameters. In this paper, we present a precise high-level power estimation methodology for the software loaded on a VLIW processor that is based on a functional level power model. The targeted processor of our approach is the TMS320C6416T DSP from Texas Instrument. We consider several important issues in our model such as the pipeline stall, inter-instructions effect and cache misses. The contributions are the following. First, a precise model to estimate the power consumption of the targeted DSP, while running a software algorithm is proposed. Second, we prove the validation and precision of our model on many typical algorithms applied in signal and image processing. Third, we further validate the precision of our model on a real application applied in the video processing field. The power consumption estimated by our model is compared to the physically measured power consumption, achieving a very low average absolute estimation error of 1.65% and a maximum absolute estimation error of only 3.3%.


international conference on microelectronics | 2000

Complete logic family using tunneling-phase-logic devices

Hossam A. H. Fahmy; R.A. Kiehl

This paper presents the work done to develop and characterize the behavior of binary tunneling phase logic (TPL) devices. Three input NAND, NOR and MINORITY functions are demonstrated using a single TPL element. The fan-out of the gates is discussed as well as the loading effects of multiple gates in cascade. Stable regions of operation are reported and future research possibilities are explored.


Scientific Reports | 2016

Single-Readout High-Density Memristor Crossbar

Mohammed Affan Zidan; Hesham Omran; Rawan Naous; Ahmed K. Sultan; Hossam A. H. Fahmy; Wei Lu; Khaled N. Salama

High-density memristor-crossbar architecture is a very promising technology for future computing systems. The simplicity of the gateless-crossbar structure is both its principal advantage and the source of undesired sneak-paths of current. This parasitic current could consume an enormous amount of energy and ruin the readout process. We introduce new adaptive-threshold readout techniques that utilize the locality and hierarchy properties of the computer-memory system to address the sneak-paths problem. The proposed methods require a single memory access per pixel for an array readout. Besides, the memristive crossbar consumes an order of magnitude less power than state-of-the-art readout techniques.


symposium on computer arithmetic | 2009

Energy and Delay Improvement via Decimal Floating Point Units

Hossam A. H. Fahmy; Ramy Raafat; Amira M. Abdel-Majeed; Rodina Samy; Tarek Eldeeb; Yasmin Farouk

Interest in decimal arithmetic increased considerably in recent years. This paper presents new designs for decimal floating point (DFP) addition, multiplication, fused multiply-add, division, and square root. It stresses the importance of energy savings achieved by hardware implementations of the IEEE standard for decimal floating point. To the best of the authors knowledge, this is the first work to discuss energy savings in DFP and the first to present a hardware implementation of a fused multiply-add. Our Newton-Raphson based divider is over three times faster than the similar design previously reported.

Collaboration


Dive into the Hossam A. H. Fahmy's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yehea I. Ismail

American University in Cairo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Khaled N. Salama

King Abdullah University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Mohammed Affan Zidan

King Abdullah University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge