Tarek A. El-Ghazawi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tarek A. El-Ghazawi is active.

Explore More

Publication

Featured researches published by Tarek A. El-Ghazawi.

IEEE Transactions on Geoscience and Remote Sensing | 2003

Automatic reduction of hyperspectral imagery using wavelet spectral analysis

Sinthop Kaewpijit; J. Le Moigne; Tarek A. El-Ghazawi

Hyperspectral imagery provides richer information about materials than multispectral imagery. The new larger data volumes from hyperspectral sensors present a challenge for traditional processing techniques. For example, the identification of each ground surface pixel by its corresponding spectral signature is still difficult because of the immense volume of data. Conventional classification methods may not be used without dimension reduction preprocessing. This is due to the curse of dimensionality, which refers to the fact that the sample size needed to estimate a function of several variables to a given degree of accuracy grows exponentially with the number of variables. Principal component analysis (PCA) has been the technique of choice for dimension reduction. However, PCA is computationally expensive and does not eliminate anomalies that can be seen at one arbitrary band. Spectral data reduction using automatic wavelet decomposition could be useful. This is because it preserves the distinctions among spectral signatures. It is also computed in automatic fashion and can filter data anomalies. This is due to the intrinsic properties of wavelet transforms that preserves high- and low-frequency features, therefore preserving peaks and valleys found in typical spectra. Compared to PCA, for the same level of data reduction, we show that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.

IEEE Computer | 2008

The Promise of High-Performance Reconfigurable Computing

Tarek A. El-Ghazawi; Esam El-Araby; Miaoqing Huang; Kris Gaj; Volodymyr V. Kindratenko; Duncan A. Buell

Several high-performance computers now use field-programmable gate arrays as reconfigurable coprocessors. The authors describe the two major contemporary HPRC architectures and explore the pros and cons of each using representative applications from remote sensing, molecular dynamics, bioinformatics, and cryptanalysis.

conference on high performance computing (supercomputing) | 2002

UPC Performance and Potential: A NPB Experimental Study

Tarek A. El-Ghazawi; François Cantonnet

UPC, or Unified Parallel C, is a parallel extension of ANSI C. UPC follows a distributed shared memory programming model aimed at leveraging the ease of programming of the shared memory paradigm, while enabling the exploitation of data locality. UPC incorporates constructs that allow placing data near the threads that manipulate them to minimize remote accesses. This paper gives an overview of the concepts and features of UPC and establishes, through extensive performance measurements of NPB workloads, the viability of the UPC programming language compared to the other popular paradigms. Further, through performance measurements we identify the challenges, the remaining steps and the priorities for UPC. It will be shown that with proper hand tuning and optimized collective operations libraries, UPC performance will be comparable to that of MPI. Furthermore, by incorporating such improvements into automatic compiler optimizations, UPC will compare quite favorably to message passing in ease of programming.

acm sigplan symposium on principles and practice of parallel programming | 2005

An evaluation of global address space languages: co-array fortran and unified parallel C

Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey; François Cantonnet; Tarek A. El-Ghazawi; Ashrujit Mohanti; Yiyi Yao; Daniel G. Chavarría-Miranda

Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.

Archive | 2006

Reconfigurable Computing: Architectures, Tools and Applications

Andreas Koch; Ram K. Krishnamurthy; John McAllister; Roger F. Woods; Tarek A. El-Ghazawi

Clustering of a large number of data points is a computational demanding task that often needs the be accelerated in order to be useful in practice. The focus of this work is on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which is one of the state-of-the-art clustering algorithms, targeting its acceleration using an FPGA device. The paper presents a novel, optimised and scalable architecture that takes advantage of the internal memory structure of modern FPGAs in order to deliver a high performance clustering system. Results show that the developed system can obtain average speed-ups of 32x in real-world tests and 202x in synthetic tests when compared to state-of-the-art software counterparts.

international parallel and distributed processing symposium | 2004

Productivity analysis of the UPC language

François Cantonnet; Yiyi Yao; Mohamed Zahran; Tarek A. El-Ghazawi

Summary form only given. Parallel programming paradigms, over the past decade, have focused on how to harness the computational power of contemporary parallel machines. Ease of use and code development productivity, has been a secondary goal. Recently, however, there has been a growing interest in understanding the code development productivity issues and their implications for the overall time-to-solution. Unified Parallel C (UPC) is a recently developed language which has been gaining rising attention. UPC holds the promise of leveraging the ease of use of the shared memory model and the performance benefit of locality exploitation. The performance potential for UPC has been extensively studied in recent research efforts. The aim of this study, however, is to examine the impact of UPC on programmer productivity. We propose several productivity metrics and consider a wide array of high performance applications. Further, we compare UPC to the most widely used parallel programming paradigm, MPI. The results show that UPC compares favorably with MPI in programmers productivity.

IEEE Transactions on Computers | 2011

New Hardware Architectures for Montgomery Modular Multiplication Algorithm

Miaoqing Huang; Kris Gaj; Tarek A. El-Ghazawi

Montgomery modular multiplication is one of the fundamental operations used in cryptographic algorithms, such as RSA and Elliptic Curve Cryptosystems. At CHES 1999, Tenca and Koç proposed the Multiple-Word Radix-2 Montgomery Multiplication (MWR2MM) algorithm and introduced a now-classic architecture for implementing Montgomery multiplication in hardware. With parameters optimized for minimum latency, this architecture performs a single Montgomery multiplication in approximately 2n clock cycles, where n is the size of operands in bits. In this paper, we propose two new hardware architectures that are able to perform the same operation in approximately n clock cycles with almost the same clock period. These two architectures are based on precomputing partial results using two possible assumptions regarding the most significant bit of the previous word. These two architectures outperform the original architecture of Tenca and Koç in terms of the product latency times area by 23 and 50 percent, respectively, for several most common operand sizes used in cryptography. The architecture in radix-2 can be extended to the case of radix-4, while preserving a factor of two speedup over the corresponding radix-4 design by Tenca, Todorov, and Koç from CHES 2001. Our optimization has been verified by modeling it using Verilog-HDL, implementing it on Xilinx Virtex-II 6000 FPGA, and experimentally testing it using SRC-6 reconfigurable computer.

mobile ad hoc networking and computing | 2003

A self-stabilizing distributed algorithm for spanning tree construction in wireless ad hoc networks

Hichem Baala; Olivier Flauzac; Jaafar Gaber; Marc Bui; Tarek A. El-Ghazawi

Spanning trees help removing cycles and establishing short paths between a given node and the rest of the nodes in a network. In ad hoc mobile computing networks, however, transient node failures occur due to being out of range or powered off. Therefore, we present a self-stabilized distributed algorithm based on homogeneous agents for constructing a random spanning tree. Our approach makes use of distributed random walks as a network traversal scheme, in order to handle dynamic topology changes in ad hoc wireless networks. Each random walk is represented by a mobile agent annexing a territory over the underlying network. These multiple random walks collapse into a final one that defines the random spanning tree. It will be shown that, compared to deterministically predetermined spanning trees, our algorithm is more resilient to transient failures that occur in ad hoc mobile networks.

IEEE Computer | 2007

Guest Editors' Introduction: High-Performance Reconfigurable Computing

Duncan A. Buell; Tarek A. El-Ghazawi; Kris Gaj; Volodymyr V. Kindratenko

High-performance reconfigurable computers have the potential to exploit coarse-grained functional parallelism as well as fine-grained instruction-level parallelism through direct hardware execution on FPGAs.

conference on high performance computing (supercomputing) | 2006

UPC: unified parallel C

Tarek A. El-Ghazawi; Lauren Smith

UPC extends ISO C into a Partioned Global Address Space (PGAS) programming language. UPC allows programmers to exploit data locality and parallelism in their applications, while maintaining ease of use. UPC is running ubiquitously across nearly all HPC platforms and has been gaining rising support from the community. UPC is relatively very easy to use for irregular access patterns which can enable many new applications that are hard to express in other paradigms. In this BoF, the UPC consortium will share with the community the progress made in applications, specifications, tools, and implementations of UPC. Future plans will be also presented. The First half of the BOF will follow a panel format, where the members will represent the UPC consortium and will speak to different aspects of the UPC developments. The second half will be a question and answer session to promote the exchange of ideas.

Explore More