F. El-Guibaly | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where F. El-Guibaly is active.

Explore More

Publication

Featured researches published by F. El-Guibaly.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1996

Area-efficient multipliers for digital signal processing applications

Sunder S. Kidambi; F. El-Guibaly; Andreas Antoniou

An area-efficient parallel sign-magnitude multiplier that receives two N-bit numbers and produces an N-bit product, referred to as a truncated multiplier, is described. The quantization of the product to N bits is achieved by omitting about half the adder cells needed to add the partial products but in order to keep the quantization error to a minimum, probabilistic biases are obtained and are then fed to the inputs of the retained adder cells. The truncated multiplier requires approximately 50% of the area of a standard parallel multiplier. The paper then shows that this design strategy can also be applied for the design of twos-complement multipliers. The paper concludes with the application of the truncated multiplier for the implementation of a digital filter and it is shown that the signal-to-noise ratio of the digital filter using a truncated multiplier is better than that using a standard multiplier.

IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 2000

A fast parallel multiplier-accumulator using the modified Booth algorithm

F. El-Guibaly

This paper presents a dependence graph (DG) to visualize and describe a merged multiply-accumulate (MAC) hardware that is based on the modified Booth algorithm (MBA). The carry-save technique is used in the Booth encoder, the Booth multiplier, and the accumulator sections to ensure the fastest possible implementation. The DG applies to any MAC data word size and allows designing multiplier structures that are regular and have minimal delay, sign-bit extensions, and datapath width. Using the DG, a fast pipelined implementation is proposed, in which an accurate delay model for deep submicron CMOS technology is used. The delay model describes multi-level gate delays, taking into account input ramp and output loading. Based on the delay model, the proposed pipelined parallel MAC design is three times faster than other parallel MAC schemes that are based on the MBA. The speedup resulted from merging the accumulate and the multiply operations and the wide use of carry-save techniques.

Microprocessors and Microsystems | 2000

A quantitative study for Java microprocessor architectural requirements. Part II: high-level language support

M.W. El-Kharashi; F. El-Guibaly; Kin Fun Li

Abstract Java was designed for network programming. This imposes certain requirements on its virtual machine instruction set architecture and on designs that intend to support Java. The purpose of this study is to carry out a behavioral analysis of the different aspects of Java instruction set architecture. Performance metrics were collected through benchmarking a bytecode interpreter. In this second part of our two-part paper, we study the instruction set utilization, instruction execution time, method invocation behavior, and the effect of object-orientation.

Microprocessors and Microsystems | 2000

A quantitative study for Java microprocessor architectural requirements. Part I: Instruction set design

M.W. El-Kharashi; F. El-Guibaly; Kin Fun Li

Abstract Java was designed for network programming. This imposes certain requirements on its virtual machine instruction set architecture and on designs that support Java. The purpose of this study is to carry out a behavioral analysis of the different aspects of Java instruction set architecture. This will help in establishing the hardware requirements for executing Java bytecodes. First, the bytecode interpreter was instrumented to include performance counters and statistics collectors. Then performance metrics were collected through benchmarking. Analyzing these data helps to identify performance-critical aspects that are candidates for hardware support, while less critical aspects can be left for software implementations. In this first part of our two-part paper, we study access patterns for data types, addressing modes, and instruction encoding. Recommendations for architectural requirements for Java processors will be made throughout this study.

pacific rim conference on communications computers and signal processing | 1997

Java microprocessors: computer architecture implications

M.W. El-Kharashi; F. El-Guibaly

Java appears to dominate the high-level programming world. It has the potential to become a standard for broad-base application development. In addition, its portability makes it ideal for the Internet. Java is compiled to an abstract virtual machine to achieve architectural neutrality. However, as an interpreted language, it suffers from slow performance. Some advanced solutions have appeared like just-in-time compilers, which achieve partial advances. Java microprocessors are the natural solution. These chips will execute Java code natively as their assembly language. Their design brings new concepts to hardware implementations. These kinds of technical challenges are always attractive and the popularity and pervasiveness of Java open an opportunity for designing these processors. This survey presents an overview on the Java microprocessors. It includes a benefit and feasibility study together with the challenges that face these chips and their potential applications.

Multidimensional Systems and Signal Processing | 1996

Mapping 3-D IIR digital filter onto systolic arrays

F. El-Guibaly; A. Tawfik

We present here an efficient systolic implementation for 3-D IIR digital filters. The systolic implementation is obtained by using an algebraic mapping technique. This new mapping technique gives us the choice to mix pipelined variables and broadcast variables. We also determine, through the mapping method, the buffer sizes, the direction of variables propagations and the data feeding and extracting points. The resultant systolic array implementation is a modular structure composed of 2-D filter modules connected by simple buffers. This new systolic implementation is regular, modular and amenable to VLSI Implementation.

international symposium on circuits and systems | 1990

Systolic implementations of two-dimensional recursive digital filters

S. Sunder; F. El-Guibaly; A. Antoniou

Implementations for two-dimensional recursive filters using systolic arrays with linear structures are presented. The processing elements used are modular and thus lead to cost-effective designs. The implementations have the maximum data rate possible, i.e. a new input sample is supplied and as new output sample is obtained every sampling period. The latency of all the systolic arrays designed is equal to one. The number of processing elements required is on the order of w*h, where w and h are, respectively, the width and height of the window used.<<ETX>>

Multidimensional Systems and Signal Processing | 1992

Systolic implementation of digital filters

S. Sunder; F. El-Guibaly; Andreas Antoniou

A systematic method for the mapping of digital filter algorithms onto systolic hardware is presented. The method is based on thez-domain characterization of the required filter. It yields filter structures that are modular, pipelined, and hierarchical, and can be used to obtain multidimensional structures. All the structures discussed have a latency of one sampling period and some have maximum concurrency. The paper also deals with the problems of line and frame wrap-around that are inherent in raster-scanned images and ways are suggested for their elimination.

Signal Processing | 1996

Design of low-delay two-channel FIR filter banks using constrained optimization

Esam Abdel-Raheem; F. El-Guibaly; Andreas Antoniou

Abstract Two approaches for the design of two-channel perfect reconstruction FIR filter banks with short reconstruction delays are presented. The approaches are based on constrained optimization. In the first approach, a low-order filter is first designed and the objective function of the filter bank is formulated as a quadratic programming problem with linear constraints. Then the Lagrange-multiplier method is used to design a higher-order filter. The method is simple, efficient, flexible, and an exact solution is obtained by solving a set of linear equations. The second approach can be used to design filters of equal as well as unequal lengths. In this approach, the design problem is formulated as a quadratic-constrained least-squares minimization problem which can be solved using standard minimization algorithms. Design examples are given to illustrate the advantages of the proposed approaches. The quality of reconstruction is considered very good and superior to those of existing methods.

Journal of Systems Architecture | 2001

A robust stack folding approach for Java processors: an operand extraction-based algorithm☆

M.W. El-Kharashi; F. El-Guibaly; Kin Fun Li

Data dependency in stack operations limits the performance of Java processors. To enhance Javas performance, existing literature suggests using stack operations folding. We extend this concept in a new folding algorithm that identifies principle operations in folding groups and extracts necessary operands from the bytecode queue. The proposed algorithm permits nested pattern folding and multiple issue of folding groups. Hence, the need for and therefore the limitations of a stack are eliminated. This paper discusses various aspects of the proposed algorithm and illustrates different folding scenarios as well as possible hazards. Benchmarking using SPECjvm98 shows excellent performance gains as compared to existing algorithms.

Explore More