Alexandre Carissimi
Universidade Federal do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexandre Carissimi.
symposium on computer architecture and high performance computing | 2009
Christiane Pousa Ribeiro; Jean-François Méhaut; Alexandre Carissimi; Márcio Castro; Luiz Gustavo Fernandes
Currently, parallel platforms based on large scale hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA) are becoming a trend in scientific High Performance Computing (HPC). Due to their memory access constraints, these platforms require a very careful data distribution. Many solutions were proposed to resolve this issue. However, most of these solutions did not include optimizations for numerical scientific data (array data structures) and portability issues. Besides, these solutions provide a restrict set of memory policies to deal with data placement. In this paper, we describe an user-level interface named Memory Affinity interface (MAi), which allows memory affinity control on Linux based cache-coherent NUMA (ccNUMA) platforms. Its main goals are, fine data control, flexibility and portability. The performance of MAi is evaluated on three ccNUMA platforms using numerical scientific HPC applications, the NAS Parallel Benchmarks and a Geophysics application. The results show important gains (up to 31\%) when compared to Linux default solution.
ieee international conference on cloud computing technology and science | 2012
Eduardo Roloff; Matthias Diener; Alexandre Carissimi; Philippe Olivier Alexandre Navaux
High-Performance Computing (HPC) in the cloud has reached the mainstream and is currently a hot topic in the research community and the industry. The attractiveness of cloud for HPC is the capability to run large applications on powerful, scalable hardware without needing to actually own or maintain this hardware. In this paper, we conduct a detailed comparison of HPC applications running on three cloud providers, Amazon EC2, Microsoft Azure and Rackspace. We analyze three important characteristics of HPC, deployment facilities, performance and cost efficiency and compare them to a cluster of machines. For the experiments, we used the well-known NAS parallel benchmarks as an example of general scientific HPC applications to examine the computational and communication performance. Our results show that HPC applications can run efficiently on the cloud. However, care must be taken when choosing the provider, as the differences between them are large. The best cloud provider depends on the type and behavior of the application, as well as the intended usage scenario. Furthermore, our results show that HPC in the cloud can have a higher performance and cost efficiency than a traditional cluster, up to 27% and 41%, respectively.
ieee international conference on high performance computing data and analytics | 2010
Christiane Pousa Ribeiro; Márcio Castro; Jean-François Méhaut; Alexandre Carissimi
On numerical scientific High Performance Computing (HPC), Non-Uniform Memory Access (NUMA) platforms are now commonplace. On such platforms, the memory affinity management remains an important concern in order to overcome the memory wall problem. Prior solutions have presented some drawbacks such as machine dependency and a limited set of memory policies. This paper introduces Minas, a framework which provides either explicit or automatic memory affinity management with architecture abstraction for ccNUMAs. We evaluate our solution on two ccNUMA platforms using two geophysics parallel applications. The results show some performance improvements in comparison with other solutions available for Linux.
high performance computing and communications | 2009
Rodrigo da Rosa Righi; Laércio Lima Pilla; Alexandre Carissimi; Philippe Olivier Alexandre Navaux; Hans-Ulrich Heiss
We have developed a model called MigBSP that controls processes rescheduling in BSP (Bulk Synchronous Parallel)applications. A BSP application is composed by one or more supersteps, each one containing both computation and communication phases followed by a synchronization barrier. Since the barrier waits for the slowest process, MigBSP’s final idea is to adjust the processes location in order to reduce the supersteps’ times. Considering the scope of the BSP model, the novel ideas of MigBSPare: (i) combination of three metrics - Memory, Computation and Communication - to measure the potential of migration of each BSP process; (ii) use of both Computation and Communication Patterns to control processes’ regularity;(iii) adaptation regarding the periodicity to launch the processes rescheduling. This paper describes MigBSP and presents some experimental results and related work.
Lecture Notes in Computer Science | 2003
Leomar S. da Rosa; Flávio Rech Wagner; Luigi Carro; Alexandre Carissimi; André Inácio Reis
This paper presents the implementation of different scheduling policies on a Java microcontroller. Seven new instructions were added to the architecture to support context switching and scheduler implementation. By using these instructions, four schedulers following the POSIX standard were developed for the specific architecture. These schedulers were used in a study about the impact of different scheduling policies for embedded systems applications. Several design costs are discussed, including the hardware cost of the extended instructions, ROM and RAM capacity used, the number of cycles to run the chosen scheduler and the application, and also the power consumption overhead. Experiments show that the exploration of different scheduling alternatives as well as careful scheduler implementation may play an important role in performance optimization.
european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 1998
Alexandre Carissimi; Marcelo Pasin
Current parallel programming models as message passing exploit properly coarse-grain parallelism and suit well for regular applications. However, many applications have irregular behaviour and fine-grain parallelism, in which cases multithreading is more suitable. Multiprocessing and clustering have became cost-effective manner to build distributed-memory parallel machines due to technological progress. This paper discusses Athapascan, a multithreaded, portable, parallel programming runtime system, targeted for irregular applications. It is designed to integrate multithreading and communication, taking profit of both multiprocessing and communicating networks.
ieee international symposium on parallel distributed processing workshops and phd forum | 2010
Christiane Pousa Ribeiro; Jean-François Méhaut; Alexandre Carissimi
Nowadays, on Multi-core Multiprocessors with Hierarchical Memory (Non-Uniform Memory Access (NUMA) characteristics), the number of cores accessing memory banks is considerably high. Such accesses produce stress on the memory banks, generating load-balancing issues, memory contention and remote accesses. In this context, how to manage memory accesses in an efficient fashion remains an important concern. To reduce memory access costs, developers have to manage data placement on their application assuring memory affinity. The problem is: How to guarantee memory affinity for different applications/NUMA platforms and assure efficiency, portability, minimal or none source code changes (transparency) and fine control of memory access patterns? In this Thesis, our research have led to the proposal of Minas: an efficient and portable memory affinity management framework for NUMA platforms. Minas provides both explicit memory affinity management and automatic one with good performance, architecture abstraction, minimal or none application source code modifications and fine control. We have evaluated its efficiency and portability by performing some experiments with numerical scientific HPC applications on NUMA platforms. The results have been compared with other solutions to manage memory affinity.
international symposium on computers and communications | 2010
Rodrigo da Rosa Righi; Laércio Lima Pilla; Alexandre Carissimi; Philippe Olivier Alexandre Navaux; Hans-Ulrich Heiss
In this paper we will describe a model for BSP (Bulk Synchronous Parallel) process rescheduling called MigBSP. Considering the scope of BSP applications, its differential approach is the combination of three metrics - Memory, Computation and Communication - in order to measure the Potential of Migration of each BSP process. In this context, this paper addresses both the efficiency and the adaptivity perspectives of this model over our multi-cluster machine.
ieee international conference on high performance computing data and analytics | 2010
Rodrigo da Rosa Righi; Laércio Lima Pilla; Alexandre Carissimi; Philippe Olivier Alexandre Navaux; Hans-Ulrich Heiss
Process migration is an useful mechanism to offer load balancing. In this context, we developed a model called MigBSP that controls processes rescheduling on BSP applications. MigBSP is especially pertinent to obtain performance on this type of applications, since they are composed by supersteps which always wait for the slowest process. In this paper, we focus on the BSP-based modeling of the widely used LU Decomposition algorithm as well as its execution with MigBSP. The use of multiple metrics to decide migrations and adaptations on rescheduling frequency turn possible gains up to 19% over our cluster-of-clusters architecture. Finally, our final idea is to show the possibility to get performance in LU effortlessly by using novel migration algorithms.
parallel computing | 2008
Fabrice Dupros; Christiane Pousa Ribeiro; Alexandre Carissimi; Jean-François Méhaut
Simulation of large scale seismic wave propagation is an important tool in seismology for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of parallel computing to improve the performance and the accuracy of the simulations. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. We therefore need to consider new approaches more suitable to such parallel systems. In this paper, we firstly report on the impact of memory affinity on the parallel performance of seismic simulations. We introduce a methodology combining efficient thread scheduling and careful data placement to overcome the limitation coming from both the parallel algorithm and the memory hierarchy. The MAi (Memory Affinity interface) is used to smoothly adapt the memory policy to the underlying architecture. We evaluate our methodology on computing nodes with different NUMA characteristics. A maximum gain of 53% is reported in comparison with a classical OpenMP implementation.
Collaboration
Dive into the Alexandre Carissimi's collaboration.
Philippe Olivier Alexandre Navaux
Universidade Federal do Rio Grande do Sul
View shared research outputsEduardo Henrique Molina da Cruz
Universidade Federal do Rio Grande do Sul
View shared research outputs