Moisés Viñas
Grupo México
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Moisés Viñas.
Journal of Parallel and Distributed Computing | 2013
Moisés Viñas; Zeki Bozkus; Basilio B. Fraguela
While recognition of the advantages of heterogeneous computing is steadily growing, the issues of programmability and portability hinder its exploitation. The introduction of the OpenCL standard was a major step forward in that it provides code portability, but its interface is even more complex than that of other approaches. In this paper, we present the Heterogeneous Programming Library (HPL), which permits the development of heterogeneous applications addressing both portability and programmability while not sacrificing high performance. This is achieved by means of an embedded language and data types provided by the library with which generic computations to be run in heterogeneous devices can be expressed. A comparison in terms of programmability and performance with OpenCL shows that both approaches offer very similar performance, while outlining the programmability advantages of HPL.
Concurrency and Computation: Practice and Experience | 2013
Moisés Viñas; Jacobo Lobeiras; Basilio B. Fraguela; Manuel Arenaz; Margarita Amor; José A. García; Manuel J. Castro; Ramón Doallo
This work presents cost‐effective multi‐graphics processing unit (GPU) parallel implementations of a finite‐volume numerical scheme for solving pollutant transport problems in bidimensional domains. The fluid is modeled by 2D shallow‐water equations, whereas the transport of pollutant is modeled by a transport equation. The 2D domain is discretized using a first‐order Roe finite‐volume scheme. Specifically, this paper presents multi‐GPU implementations of both a solution that exploits recomputation on the GPU and an optimized solution that is based on a ghost cell decoupling approach. Our multi‐GPU implementations have been optimized using nonblocking communications, overlapping communications and computations and the application of ghost cell expansion to minimize communications. The fastest one reached a speedup of 78 × using four GPUs on an InfiniBand network with respect to a parallel execution on a multicore CPU with six cores and two‐way hyperthreading per core. Such performance, measured using a realistic problem, enabled the calculation of solutions not only in real time but also in orders of magnitude faster than the simulated time.Copyright
ieee international conference on high performance computing data and analytics | 2013
Jacobo Lobeiras; Moisés Viñas; Margarita Amor; Basilio B. Fraguela; Manuel Arenaz; José Antonio Orosa García; Manuel J. Castro
In this work, several parallel implementations of a numerical model of pollutant transport on a shallow water system are presented. These parallel implementations are developed in two phases. First, the sequential code is rewritten to exploit the stream programming model. And second, the streamed code is targeted for current multi-threaded systems, in particular, multi-core CPUs and modern GPUs. The performance is evaluated on a multi-core CPU using OpenMP, and on a GPU using the streaming-oriented programming language Brook+, as well as the standard language for heterogeneous systems, OpenCL.
international conference on high performance computing and simulation | 2011
Moisés Viñas; Jacobo Lobeiras; Basilio B. Fraguela; Manuel Arenaz; Margarita Amor; Ramón Doallo
Shallow water simulation enables the study of problems such as dam break, river, canal and coastal hydrodynamics, as well as the transport of inert substances, such as pollutants, on a fluid. This article describes a GPU efficient and cost-effective CUDA implementation of a finite volume numerical scheme for solving pollutant transport problems in bidimensional domains. The fluid is modeled by 2D shallow water equations, while the transport of pollutant is modeled by a transport equation. The 2D domain is discretized using a first order finite volume scheme. The evaluation using a realistic problem shows that the implementation makes a good usage of the computational resources, being very efficient for real-life complex simulations. The speedup reached allowed us to complete a simulation in 2 hours in contrast with the 239 hours (10 days) required by a sequential execution in a standard CPU.
Journal of Parallel and Distributed Computing | 2017
Moisés Viñas; Basilio B. Fraguela; Diego Andrade; Ramón Doallo
Heterogeneous devices require much more work from programmers than traditional CPUs, particularly when there are several of them, as each one has its own memory space. Multi-device applications require to distribute kernel executions and, even worse, arrays portions that must be kept coherent among the different device memories and the host memory. In addition, when devices with different characteristics participate in a computation, optimally distributing the work among them is not trivial. In this paper we extend an existing framework for the programming of accelerators called Heterogeneous Programming Library (HPL) with three kinds of improvements that facilitate these tasks. The first two ones are the ability to define subarrays and subkernels, which distribute kernels on different devices. The last one is a convenient extension of the subkernel mechanism to distribute computations among heterogeneous devices seeking the best work balance among them. This last contribution includes two analytical models that have proved to automatically provide very good work distributions. Our experiments also show the large programmability advantages of our approach and the negligible overhead incurred. Three approaches to develop multi-device heterogeneous applications are proposed.Easy, efficient and coherent subarray usage for kernels and movements is implemented.Simple argument annotations allow to easily split kernels and arrays among devices.Accurate automatic workload balancing is provided by means of a friendly API.The results are very promising both in terms of performance and programmability.
international conference on parallel processing | 2016
Moisés Viñas; Basilio B. Fraguela; Diego Andrade; Ramón Doallo
The programming of heterogeneous clusters is inherently complex, as these architectures require programmers to manage both distributed memory and computational units with a very different nature. Fortunately, there has been extensive research on the development of frameworks that raise the level of abstraction of cluster-based applications, thus enabling the use of programming models that are much more convenient that the traditional one based on message-passing. One of such proposals is the Hierarchically Tiled Array (HTA), a data type that represents globally distributed arrays on which it is possible to perform a wide range of data-parallel operations. In this paper we explore for the first time the development of heterogeneous applications for clusters using HTAs. In order to use a high level API also for the heterogeneous parts of the application, we developed them using the Heterogeneous Programming Library (HPL), which operates on top of OpenCL but providing much better programmability. Our experiments show that this approach is a very attractive alternative, as it obtains large programmability benefits with respect to a traditional implementation based on MPI and OpenCL, while presenting average performance overheads just around 2%.
Concurrency and Computation: Practice and Experience | 2018
Moisés Viñas; Basilio B. Fraguela; Diego Andrade; Ramón Doallo
The rise of heterogeneous systems has given place to great challenges for users as they involve new concepts, restrictions, and frameworks. Their exploitation is further complicated in the context of distributed memory systems, which require the usage of additional different programming paradigms and tools. In this paper, we propose a novel approach to program heterogeneous clusters that is based on high‐level abstractions such as tiles and hierarchical decomposition combined with the powerful APIs that data types and embedded languages can provide in languages such as C++. Rather than building our proposal from scratch, we have implemented it as a natural integration of the existing Hierarchically Tiled Arrays (HTA) and Heterogeneous Programming Library (HPL) projects, ie, the first one being focused on distributed computing and the second one on heterogeneous processing. The result, called Heterogeneous Hierarchically Tiled Arrays (H2TA), is very intuitive and easy to use thanks to the global view of the data and the single‐threaded view of the execution that it provides at cluster level together with the transparency it provides with respect to the management of the heterogeneous devices. An evaluation comparing our proposal with MPI‐based implementations shows its large programmability advantages and the reasonable overhead incurred.
Concurrency and Computation: Practice and Experience | 2017
Moisés Viñas; Basilio B. Fraguela; Diego Andrade; Ramón Doallo
Stencil computations are very common in scientific codes. Heterogeneous systems achieve good results solving these problems, but their programming is complex because of the ghost regions required in multi‐device implementations and the difficulty to properly exploit their hardware. The Heterogeneous Programming Library (HPL) is a recent framework that improves the programmability of heterogeneous devices. This paper describes two extensions of HPL focused on stencil computations. The first one allows to automatically update the ghost regions they involve. The second one automates the implementation of the computational kernels of these algorithms. In our evaluation, the first mechanism reduces on average the number of lines of code and the Halstead programming effort of the host code of comparable HPL baselines by 34% and 64.2%, respectively, while the second contribution reduces these metrics by 72% and 79% in the computational kernels, respectively. Also, the first technique has negligible performance overheads, while the second one matches the performance of manually developed kernels. As an added benefit, the facilitation of the development of these codes thanks to these techniques helps programmers experiment with optimizations suited for this applications such as the ghost cell expansion technique, which provides speedups of up to 13% in our experiments.
international conference on conceptual structures | 2015
Moisés Viñas; Basilio B. Fraguela; Zeki Bozkus; Diego Andrade
The Journal of Supercomputing | 2015
Moisés Viñas; Zeki Bozkus; Basilio B. Fraguela; Diego Andrade; Ramón Doallo