François Cantonnet
George Washington University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by François Cantonnet.
conference on high performance computing (supercomputing) | 2002
Tarek A. El-Ghazawi; François Cantonnet
UPC, or Unified Parallel C, is a parallel extension of ANSI C. UPC follows a distributed shared memory programming model aimed at leveraging the ease of programming of the shared memory paradigm, while enabling the exploitation of data locality. UPC incorporates constructs that allow placing data near the threads that manipulate them to minimize remote accesses. This paper gives an overview of the concepts and features of UPC and establishes, through extensive performance measurements of NPB workloads, the viability of the UPC programming language compared to the other popular paradigms. Further, through performance measurements we identify the challenges, the remaining steps and the priorities for UPC. It will be shown that with proper hand tuning and optimized collective operations libraries, UPC performance will be comparable to that of MPI. Furthermore, by incorporating such improvements into automatic compiler optimizations, UPC will compare quite favorably to message passing in ease of programming.
acm sigplan symposium on principles and practice of parallel programming | 2005
Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey; François Cantonnet; Tarek A. El-Ghazawi; Ashrujit Mohanti; Yiyi Yao; Daniel G. Chavarría-Miranda
Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.
international parallel and distributed processing symposium | 2004
François Cantonnet; Yiyi Yao; Mohamed Zahran; Tarek A. El-Ghazawi
Summary form only given. Parallel programming paradigms, over the past decade, have focused on how to harness the computational power of contemporary parallel machines. Ease of use and code development productivity, has been a secondary goal. Recently, however, there has been a growing interest in understanding the code development productivity issues and their implications for the overall time-to-solution. Unified Parallel C (UPC) is a recently developed language which has been gaining rising attention. UPC holds the promise of leveraging the ease of use of the shared memory model and the performance benefit of locality exploitation. The performance potential for UPC has been extensively studied in recent research efforts. The aim of this study, however, is to examine the impact of UPC on programmer productivity. We propose several productivity metrics and consider a wide array of high performance applications. Further, we compare UPC to the most widely used parallel programming paradigm, MPI. The results show that UPC compares favorably with MPI in programmers productivity.
international parallel and distributed processing symposium | 2003
François Cantonnet; Yiyi Yao; Smita Annareddy; Ahmed S. Mohamed; Tarek A. El-Ghazawi
UPC is an explicit parallel extension of ANSI C, which has been gaining rising attention from vendors and users. In this paper, we consider the low-level monitoring and experimental performance evaluation of a new implementation of the UPC compiler on the SGI Origin family of NUMA architectures. These systems offer many opportunities for the high-performance implantation of UPC They also offer, due to their many hardware monitoring counters, the opportunity for low-level performance measurements to guide compiler implementations. Early, UPC compilers have the challenge of meeting the syntax and semantics requirements of the language. As a result, such compilers tend to focus on correctness rather than on performance. In this paper, we report on the performance of selected applications and kernels under this new compiler. The measurements were designed to help shed some light on the next steps that should be taken by UPC compiler developers to harness the full performance and usability potential of UPC under these architectures.
international parallel and distributed processing symposium | 2005
François Cantonnet; Tarek A. El-Ghazawi; Pascal Lorenz; Jaafer Gaber
The distributed shared memory (DSM) model is designed to leverage the ease of programming of the shared memory paradigm, while enabling the high-performance by expressing locality as in the message-passing model. Experience, however, has shown that DSM programming languages, such as UPC, may be unable to deliver the expected high level of performance. Initial investigations have shown that among the major reasons is the overhead of translating from the UPC memory model to the target architecture virtual addresses space, which can be very costly. Experimental measurements have shown this overhead increasing execution time by up to three orders of magnitude. Previous work has also shown that some of this overhead can be avoided by hand-tuning, which on the other hand can significantly decrease the UPC ease of use. In addition, such tuning can only improve the performance of local shared accesses but not remote shared accesses. Therefore, a new technique that resembles the translation look aside buffers (TLBs) is proposed here. This technique, which is called the memory model translation buffer (MMTB) has been implemented in the GCC-UPC compiler using two alternative strategies, full-table (FT) and reduced-table (RT). It would be shown that the MMTB strategies can lead to a performance boost of up to 700%, enabling ease-of-programming while performing at a similar performance to hand-tuned UPC and MPI codes.
Future Generation Computer Systems | 2006
Tarek A. El-Ghazawi; François Cantonnet; Yiyi Yao; Smita Annareddy; Ahmed S. Mohamed
Unified Parallel C (UPC) is an explicit parallel extension to ISO C which follows the Partitioned Global Address Space (PGAS) programming model. UPC, therefore, combines the ability to express parallelism while exploiting locality. To do so, compilers must embody effective UPC-specific optimizations. In this paper we present a strategy for evaluating the performance of PGAS compilers. It is based on emulating possible optimizations and comparing the performance to the raw compiler performance. It will be shown that this technique uncovers missed optimization opportunities. The results also demonstrate that, with such automatic optimizations, the UPC performance will be compared favorably with other paradigms.
international geoscience and remote sensing symposium | 2004
Abhishek Agarwal; Jacqueline LeMoigne; Joanna Joiner; Tarek A. El-Ghazawi; François Cantonnet
Recently developed hyperspectral sensors provide much richer information than comparable multispectral sensors. However traditional methods that have been designed for multispectral data are not easily adaptable to hyperspectral data. One way to approach this problem is to perform dimension reduction as pre-processing, i.e. to apply a transformation that brings data from a high order dimension to a low order dimension. Wavelet spectral analysis of hyperspectral images has been recently proposed as a method for dimension reduction and, when tested on the classification of AVIRIS data, has shown promising results over the traditional principal component analysis (PCA) technique. We propose to extend and apply the wavelet analysis reduction method to the Atmospheric Infrared Sounder (AIRS) instrument data, designed to measure the Earths atmospheric water vapor and temperature profiles on a global scale. With more than 2,000 channels, the AIRS infrared data represent a good candidate for dimension reduction, and especially wavelet reduction, due to its computational efficiency and the large data sizes involved.
Archive | 2005
Tarek A. El-Ghazawi; François Cantonnet; Yiyi Yao; Jeffrey Vetter
Archive | 2005
François Cantonnet; Tarek A. El-Ghazawi; Pascal Lorenz
ISCA PDCS | 2003
Ahmed S. Mohamed; François Cantonnet