Mauricio Hanzich | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mauricio Hanzich is active.

Explore More

Publication

Featured researches published by Mauricio Hanzich.

IEEE Transactions on Parallel and Distributed Systems | 2011

Assessing Accelerator-Based HPC Reverse Time Migration

Mauricio Araya-Polo; Javier Cabezas; Mauricio Hanzich; Miquel Pericàs; Felix Rubio; Isaac Gelado; Muhammad Shafiq; Enric Morancho; Nacho Navarro; Eduard Ayguadé; José María Cela; Mateo Valero

Oil and gas companies trust Reverse Time Migration (RTM), the most advanced seismic imaging technique, with crucial decisions on drilling investments. The economic value of the oil reserves that require RTM to be localized is in the order of 1013 dollars. But RTM requires vast computational power, which somewhat hindered its practical success. Although, accelerator-based architectures deliver enormous computational power, little attention has been devoted to assess the RTM implementations effort. The aim of this paper is to identify the major limitations imposed by different accelerators during RTM implementations, and potential bottlenecks regarding architecture features. Moreover, we suggest a wish list, that from our experience, should be included as features in the next generation of accelerators, to cope with the requirements of applications like RTM. We present an RTM algorithm mapping to the IBM Cell/B.E., NVIDIA Tesla and an FPGA platform modeled after the Convey HC-1. All three implementations outperform a traditional processor (Intel Harpertown) in terms of performance (10x), but at the cost of huge development effort, mainly due to immature development frameworks and lack of well-suited programming models. These results show that accelerators are well positioned platforms for this kind of workload. Due to the fact that our RTM implementation is based on an explicit high order finite difference scheme, some of the conclusions of this work can be extrapolated to applications with similar numerical scheme, for instance, magneto-hydrodynamics or atmospheric flow simulations.

ieee international conference on high performance computing data and analytics | 2009

3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors

Mauricio Araya-Polo; Felix Rubio; Raúl de la Cruz; Mauricio Hanzich; José María Cela; Daniele Paolo Scarpazza

Reverse-Time Migration (RTM) is a state-of-the-art technique in seismic acoustic imaging, because of the quality and integrity of the images it provides. Oil and gas companies trust RTM with crucial decisions on multi-million-dollar drilling investments. But RTM requires vastly more computational power than its predecessor techniques, and this has somewhat hindered its practical success. On the other hand, despite multi-core architectures promise to deliver unprecedented computational power, little attention has been devoted to mapping efficiently RTM to multi-cores. In this paper, we present a mapping of the RTM computational kernel to the IBM Cell/B.E. processor that reaches close-to-optimal performance. The kernel proves to be memory-bound and it achieves a 98% utilization of the peak memory bandwidth. Our Cell/B.E. implementation outperforms a traditional processor (PowerPC 970MP) in terms of performance (with an 15.0× speedup) and energy-efficiency (with a 10.0× increase in the GFlops/W delivered). Also, it is the fastest RTM implementation available to the best of our knowledge. These results increase the practical usability of RTM. Also, the RTM-Cell/B.E. combination proves to be a strong competitor in the seismic arena.

european conference on parallel processing | 2005

CISNE: a new integral approach for scheduling parallel applications on non-dedicated clusters

Mauricio Hanzich; Francesc Giné; Porfidio Hernández; Francesc Solsona; Emilio Luque

Our main interest is oriented towards keeping both local and parallel jobs together in a non-dedicated cluster. In order to obtain some profits from the parallel applications, it is important to consider time and space sharing as a mean to enhance the scheduling decisions. In this work, we introduce an integral scheduling system for non-dedicated clusters, termed CISNE. It includes both a previously developed dynamic coscheduling system and a space-sharing job scheduler to make better scheduling decisions than can be made separately. CISNE allows multiple parallel applications to be executed concurrently in a non dedicated Linux cluster with a good performance, as much from the point of view of the local user as that of the parallel application user. This is possible without disturbing the local user and obtaining profits for the parallel user. The good performance of CISNE has been evaluated in a Linux cluster.

IEEE Micro | 2015

Broadcast-Enabled Massive Multicore Architectures: A Wireless RF Approach

Sergi Abadal; Benny Sheinman; Oded Katz; Ofer Markish; Danny Elad; Yvan Fournier; Damian Roca; Mauricio Hanzich; Guillaume Houzeaux; Mario Nemirovsky; Eduard Alarcón; Albert Cabellos-Aparicio

Broadcast traditionally has been regarded as a prohibitive communication transaction in multiprocessor environments. Nowadays, such a constraint largely drives the design of architectures and algorithms all-pervasive in diverse computing domains, directly and indirectly leading to diminishing performance returns as the many-core era is approaching. Novel interconnect technologies could help revert this trend by offering, among others, improved broadcast support, even in large-scale chip multiprocessors. This article outlines the prospects of wireless on-chip communication technologies pointing toward low-latency (a few cycles) and energy-efficient broadcast (a few picojoules per bit). It also discusses the challenges and potential impact of adopting these technologies as key enablers of unconventional hardware architectures and algorithmic approaches, in the pathway of significantly improving the performance, energy efficiency, scalability, and programmability of many-core chips.

Lecture Notes in Computer Science | 2004

Coscheduling and Multiprogramming Level in a Non-dedicated Cluster

Mauricio Hanzich; Francesc Giné; Porfidio Hernández; Francesc Solsona; Emilio Luque

Our interest is oriented towards keeping both local and parallel jobs together in a time-sharing non-dedicated cluster. In such systems, dynamic coscheduling techniques, without memory restriction, that consider the MultiProgramming Level for parallel applications (MPL), is a main goal in current cluster research. In this paper, a new technique called Cooperating Coscheduling (CCS), that combines a dynamic coscheduling system and a resource balancing schema, is applied.

Computers & Geosciences | 2014

Finite-difference staggered grids in GPUs for anisotropic elastic wave propagation simulation

Felix Rubio; Mauricio Hanzich; Albert Farrés; Josep de la Puente; José María Cela

The 3D elastic wave equations can be used to simulate the physics of waves traveling through the Earth more precisely than acoustic approximations. However, this improvement in quality has a counterpart in the cost of the numerical scheme. A possible strategy to mitigate that expense is using specialized, high-performing architectures such as GPUs. Nevertheless, porting and optimizing a code for such a platform require a deep understanding of both the underlying hardware architecture and the algorithm at hand. Furthermore, for very large problems, multiple GPUs must work concurrently, which adds yet another layer of complexity to the codes. In this work, we have tackled the problem of porting and optimizing a 3D elastic wave propagation engine which supports both standard- and fully-staggered grids to multi-GPU clusters. At the single GPU level, we have proposed and evaluated many optimization strategies and adopted the best performing ones for our final code. At the distributed memory level, a domain decomposition approach has been used which allows for good scalability thanks to using asynchronous communications and I/O. HighlightsWe use staggered grids to simulate an elastic wave propagation across the earth.As the performance is critical, we make use of GPUs to accelerate the simulation.We show how the work is distributed among the nodes in a domain decomposition case.Regarding the complexity of the simulation, we obtain speed-ups from 10i? to 14i?.

Seg Technical Program Expanded Abstracts | 2008

Evaluation of 3D RTM On HPC Platforms

Francisco Ortigosa; Mauricio Araya-Polo; Felix Rubio; Mauricio Hanzich; Raúd de la Cruz; José María Cela

Reverse Time Migration (RTM) has become the latest chapter in seismic imaging for geologically complex subsurface areas. In particular has proven to be very useful for the subsaly oil plays of the US Gulf of Mexico. However, RTM cannot be applied extensively due to the extreme computational demand. The recent availability of multi-core processors, homogeneous and heterogeneous, may provide the required compute power. In this paper, we benchmark an effective RTM algorithm on several HPC platforms to assess viability of hardware.

Journal of Parallel and Distributed Computing | 2013

State-based predictions with self-correction on Enterprise Desktop Grid environments

Josep L. Lérida; Francesc Solsona; Porfidio Hernández; Francesc Giné; Mauricio Hanzich; Josep Conde

The abundant computing resources in current organizations provide new opportunities for executing parallel scientific applications and using resources. The Enterprise Desktop Grid Computing (EDGC) paradigm addresses the potential for harvesting the idle computing resources of an organizations desktop PCs to support the execution of the companys large-scale applications. In these environments, the accuracy of response-time predictions is essential for effective metascheduling that maximizes resource usage without harming the performance of the parallel and local applications. However, this accuracy is a major challenge due to the heterogeneity and non-dedicated nature of EDGC resources. In this paper, two new prediction techniques are presented based on the state of resources. A thorough analysis by linear regression demonstrated that the proposed techniques capture the real behavior of the parallel applications better than other common techniques in the literature. Moreover, it is possible to reduce deviations with a proper modeling of prediction errors, and thus, a Self-adjustable Correction method (SAC) for detecting and correcting the prediction deviations was proposed with the ability to adapt to the changes in load conditions. An extensive evaluation in a real environment was conducted to validate the SAC method. The results show that the use of SAC increases the accuracy of response-time predictions by 35%. The cost of predictions with self-correction and its accuracy in a real environment was analyzed using a combination of the proposed techniques. The results demonstrate that the cost of predictions is negligible and the combined use of the prediction techniques is preferable.

international symposium on parallel and distributed processing and applications | 2006

MetaLoRaS: a predictable metascheduler for non-dedicated multiclusters

J. Ll Lérida; Francesc Solsona; Francesc Giné; Mauricio Hanzich; Porfidio Hernández; Emilio Luque

The main aim of this work is to take advantage of the computer resources of a single organization or Multicluster system to execute distributed applications efficiently without excessively damaging the local users. In doing so we propose a Metascheduler environment named MetaLoRaS with a 2-level hierarchical architecture for a non-dedicated Multicluster with job prediction capabilities. Among other Metaschedulers, the non-dedicated feature and an efficient prediction system are the most distinctive characteristics of MetaLoRaS. Another important contribution is the effective cluster selection mechanism, based on the prediction system. In this article, we show how the hierarchical architecture and simulation mechanism are the best choices if we want to obtain an accurate prediction system and, at the same time, the best turnaround times for distributed jobs without damaging local user performance.

ieee international conference on high performance computing data and analytics | 2014

Lossy Data Compression with DCT Transforms

F. Rubio Dalmau; Mauricio Hanzich; J. de la Puente; N. Gutierrez

In todays computers, disks are the slowest among the performance bottlenecks presented by a computer. In order to overcome their limitations, researchers/developers must re-engineer their codes to save less often data to disk and to reduce the amount of saved data. This paper focuses on overcoming the limitations imposed by the local disk in terms of transfer time and total size of the resulting input/output (I/O) operations. To do this, we have developed a variation of a well-known family of image compression algorithms (Discrete Cosine Transform), that have already proved their properties in terms of image bandwidth compression and precision. By using this compression scheme, we are able to perform the simulation without having to use additional techniques as check-pointing, or random boundaries. We show that, with a well-designed compression algorithm, the time to the result can be dramatically reduced, as well as the disk required to process it. Finally, we show the comparison of a raw industrial, synthetic shot compared to that processed with our scheme, showing that the differences among them are affordable whilst the compression ratio obtained is considerable.

Explore More