Esam El-Araby | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Esam El-Araby is active.

Explore More

Publication

Featured researches published by Esam El-Araby.

IEEE Computer | 2008

The Promise of High-Performance Reconfigurable Computing

Tarek A. El-Ghazawi; Esam El-Araby; Miaoqing Huang; Kris Gaj; Volodymyr V. Kindratenko; Duncan A. Buell

Several high-performance computers now use field-programmable gate arrays as reconfigurable coprocessors. The authors describe the two major contemporary HPRC architectures and explore the pros and cons of each using representative applications from remote sensing, molecular dynamics, bioinformatics, and cryptanalysis.

field-programmable technology | 2004

Wavelet spectral dimension reduction of hyperspectral imagery on a reconfigurable computer

Esam El-Araby; Tarek A. El-Ghazawi; J. Le Moigne; K. Gaj

Hyperspectral imagery, by definition, provides valuable remote sensing observations at hundreds of frequency bands. Conventional image classification (interpretation) methods may not be used without dimension reduction preprocessing. Automatic wavelet reduction has been proven to yield better or comparable classification accuracy, while achieving substantial computational savings. However, the large hyperspectral data volumes remain to present a challenge for traditional processing techniques. Reconfigurable computers (RCs) can leverage the synergism between conventional processors and FPGAs to provide low-level hardware functionality at the same level of programmability as general-purpose computers. We investigate the potential of using RCs for on-board, i.e. aboard airborne/spaceborne carriers, preprocessing of hyperspectral imagery by prototyping for the first time the automatic wavelet dimension reduction algorithm. Our investigation exploits the fine and coarse grain parallelism provided by the RCs and has been experimentally verified on one of the state-of the art reconfigurable platforms, SRC-6E. An order of magnitude speedup over traditional processing techniques has been reported.

ACM Transactions on Reconfigurable Technology and Systems | 2009

Exploiting Partial Runtime Reconfiguration for High-Performance Reconfigurable Computing

Esam El-Araby; Ivan Gonzalez; Tarek A. El-Ghazawi

Runtime Reconfiguration (RTR) has been traditionally utilized as a means for exploiting the flexibility of High-Performance Reconfigurable Computers (HPRCs). However, the RTR feature comes with the cost of high configuration overhead which might negatively impact the overall performance. Currently, modern FPGAs have more advanced mechanisms for reducing the configuration overheads, particularly Partial Runtime Reconfiguration (PRTR). It has been perceived that PRTR on HPRC systems can be the trend for improving the performance. In this work, we will investigate the potential of PRTR on HPRC by formally analyzing the execution model and experimentally verifying our analytical findings by enabling PRTR for the first time, to the best of our knowledge, on one of the current HPRC systems, Cray XD1. Our approach is general and can be applied to any of the available HPRC systems. The paper will conclude with recommendations and conditions, based on our conceptual and experimental work, for the optimal utilization of PRTR as well as possible future usage in HPRC.

field-programmable technology | 2005

Performance of sorting algorithms on the SRC 6 reconfigurable computer

John Harkins; Tarek A. El-Ghazawi; Esam El-Araby; Miaoqing Huang

The execution speed of the FPGA processing elements are compared to the microprocessor processing elements in the SRC 6 reconfigurable computer using the following algorithms for sorting: quick sort, heap sort, radix sort, bitonic sort, and odd/even merge. The results show that, for sorting, FPGA technology may not be the best processor choice and that factors such as memory bandwidth, clock speed, algorithm computational density and an algorithms ability to be pipelined all have an impact on FPGA performance.

international conference on parallel processing | 2011

GPU Resource Sharing and Virtualization on High Performance Computing Systems

Teng Li; Vikram K. Narayana; Esam El-Araby; Tarek A. El-Ghazawi

Modern Graphic Processing Units (GPUs) are widely used as application accelerators in the High Performance Computing (HPC) field due to their massive floating-point computational capabilities and highly data-parallel computing architecture. Contemporary high performance computers equipped with co-processors such as GPUs primarily execute parallel applications using the Single Program Multiple Data (SPMD) model, which requires balanced computing resources of both microprocessor and co-processors to ensure full system utilization. While the inclusion of GPUs in HPC systems provides more computing resources and significant performance improvements, the asymmetrical distribution of the number of GPUs relative to the microprocessors can result in an underutilization of overall system computing resources. In this paper, we propose a GPU resource virtualization approach to allow underutilized microprocessors to share the GPUs. We analyze factors affecting the parallel execution performance on GPUs and conduct a theoretical performance estimation based on the most recent GPU architectures as well as the SPMD model. Then we present the implementation details of the virtualization infrastructure, followed by an experimental verification of the proposed concepts using an NVIDIA Fermi GPU computing node. The results demonstrate a considerable performance gain over the traditional SPMD execution without virtualization. Furthermore, the proposed solution enables full utilization of the asymmetrical system resources, through the sharing of the GPUs among microprocessors, while incurring low overheads due to the virtualization layer.

southern conference programmable logic | 2007

Comparative Analysis of High Level Programming for Reconfigurable Computers: Methodology and Empirical Study

Esam El-Araby; Mohamed Taher; Mohamed Abouellail; Tarek A. El-Ghazawi; Gregory B. Newby

Most application developers are willing to give up some performance and chip utilization in exchange of productivity. High-level tools for developing reconfigurable computing applications trade performance with ease-of-use. However, it is hard to know in a general sense how much performance and utilization one is giving up and how much ease-of-use he/she is gaining. More importantly, given the lack of standards and the uncertainty generated by sales literature, it is very hard to know the real differences that exist among different high-level programming paradigms. In order to do so, one needs a formal methodology and/or a framework that uses a common set of metrics and common experiments over a number of representative tools. In this work, we consider three representative high-level tools, Impulse-C, Mitrion-C, and DSPLogic in the Cray XD1 environment. These tools were selected to represent imperative programming, functional programming and graphical programming, and thereby demonstrate the applicability of our methodology. It will be shown that in spite of the disparity in concepts behind those tools, our methodology will be able to formally uncover the basic differences among them and analytically assess their comparative performance, utilization, and ease-of-use.

International Journal of Reconfigurable Computing | 2012

High-Performance Reconfigurable Computing

Khaled Benkrid; Esam El-Araby; Miaoqing Huang; Kentaro Sano; Thomas Steinke

1 School of Engineering, The University of Edinburgh, Edinburgh EH9 3JL, UK 2Electrical Engineering and Computer Science, The Catholic University of America, Washington, DC 20064, USA 3Department of Computer Science and Computer Engineering, University of Arkansas, Fayetteville, AR 72701, USA 4Graduate School of Information Sciences, Tohoku University, 6-6-01 Aramaki Aza Aoba, Sendai 980-8579, Japan 5Zuse-Institut Berlin (ZIB), Takustrase 7, 14195 Berlin-Dahlem, Germany

international workshop on high-performance reconfigurable computing technology and applications | 2008

Virtualizing and sharing reconfigurable resources in High-Performance Reconfigurable Computing systems

Esam El-Araby; Ivan Gonzalez; Tarek A. El-Ghazawi

High-performance reconfigurable computers (HPRCs) are parallel computers but with added FPGA chips. Examples of such systems are the Cray XT5h and Cray XD1, the SRC-7 and SRC-6, and the SGI Altix/RASC. The execution of parallel applications on HPRCs mainly follows the single-program multiple-data (SPMD) model, which is largely the case in traditional high-performance computers (HPCs). In addition, the prevailing usage of FPGAs in such systems has been as co-processors. The overall system resources, however, are often underutilized because of the asymmetric distribution of the reconfigurable processors relative to the conventional processors. This asymmetry is often a challenge for using the SPMD programming model on these systems. In this work, we propose a resource virtualization solution based on partial run-time reconfiguration (PRTR). This technique will allow sharing the reconfigurable processors among the underutilized processors. We will present our virtualization infrastructure augmented with an analytical investigation. We will verify our proposed concepts with experimental implementations using the Cray XD1 as a testbed. It will be shown that this approach is quite promising and will allow full exploitation of the system resources with fair sharing of the reconfigurable processors among the microprocessors. Our approach is general and can be applied to any of the available HPRC systems.

IEEE Antennas and Wireless Propagation Letters | 2013

Parallelizing Fast Multipole Method for Large-Scale Electromagnetic Problems Using GPU Clusters

Quang M. Nguyen; Vinh Dang; Ozlem Kilic; Esam El-Araby

This letter investigates the solution of large-scale electromagnetic problems by using the single-level Fast Multipole Method (FMM). Problems of large scale require high computational capability that cannot be accommodated using conventional computing systems. We investigate a parallel implementation of FMM on a 13-node graphics processing unit (GPU) cluster that populates Nvidia Tesla M2090 GPUs. The implementation details and the performance achievements in terms of accuracy, speedup, and scalability are discussed. The experimental results demonstrate that our FMM implementation on GPUs is much faster than (up to 700 ×) that of the CPU implementation. Moreover, the scalability of the GPU implementation is very close to the theoretical linear expectations.

field-programmable technology | 2005

Prototyping automatic cloud cover assessment (ACCA) algorithm for remote sensing on-board processing on a reconfigurable computer

Esam El-Araby; Mohamed Taher; Tarek A. El-Ghazawi; J. Le Moigne

Clouds have a critical role in many studies, e.g. weather- and climate-related studies. However, they represent a source of errors in many applications, and the presence of cloud contamination can hinder the use of satellite data. This requires a cloud detection process to mask out cloudy pixels from further processing. The trend for remote sensing satellite missions has always been towards smaller size, lower cost, more flexibility, and higher computational power. Reconfigurable computers (RCs) combine the flexibility of traditional microprocessors with the power of field programmable gate arrays (FPGAs). Therefore, RCs are a promising candidate for on-board preprocessing. This paper presents the design and implementation of an RC-based real-time cloud detection system. We investigate the potential of using RCs for on-board preprocessing by prototyping the Landsat 7 ETM+ ACCA algorithm on one of the state-of-the art reconfigurable platforms, SRC-6E. Although a reasonable amount of investigations of the ACCA cloud detection algorithm using FPGAs has been reported in the literature, very few details/results were provided and/or limited contributions were accomplished. Our work has been proven to provide higher performance and higher detection accuracy

Explore More