José Carlos Mouriño

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José Carlos Mouriño is active.

Explore More

Publication

Featured researches published by José Carlos Mouriño.

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models | 2009

Evaluation of UPC programmability using classroom studies

Carlos Teijeiro; Guillermo L. Taboada; Juan Touriño; Basilio B. Fraguela; Ramón Doallo; Damián A. Mallón; Andrés Gómez; José Carlos Mouriño; Brian Wibecan

The study of a language in terms of programmability is a very interesting issue in parallel programming. Traditional approaches in this field have studied different methods, such as the number of Lines of Code or the analysis of programs, in order to prove the benefits of using a paradigm compared to another. Nevertheless, these methods usually focus only on code analysis, without giving much importance to the conditions of the development process and even to the learning stage, or the benefits and disadvantages of the language reported by the programmers. In this paper we present a methodology to accomplish a programmability study with UPC (Unified Parallel C) through the use of classroom studies with a group of novice UPC programmers. This work will show the design of these sessions and the analysis of the results obtained (code analysis and survey responses). Thus, it is possible to characterize the current benefits and disadvantages of UPC, as well as to report some desirable features that could be included in this language standard.

international conference on parallel processing | 2001

The STEM-II air quality model on a distributed memory system

José Carlos Mouriño; María J. Martín; Ramón Doallo; David E. Singh; Francisco F. Rivera; Javier D. Bruguera

STEM-II is an Eulerian air quality model which simulates transport, chemical transformations, emissions and depositions processes in an integrated framework. The model is computationally intensive because the governing equations are non-linear, highly coupled and stiff. The purpose of this work is the reduction of CPU time needed by each simulation by means of the parallel implementation of the code to obtain real-time predictions. The improvements achieved on distributed memory systems using the MPI library are shown.

high performance computing and communications | 2009

Performance Evaluation of Unified Parallel C Collective Communications

Guillermo L. Taboada; Carlos Teijeiro; Juan Touriño; Basilio B. Fraguela; Ramón Doallo; José Carlos Mouriño; Damián A. Mallón; Andrés Gómez

Unified Parallel C (UPC) is an extension of ANSI C designed for parallel programming. UPC collective primitives, which are part of the UPC standard, increase programming productivity while reducing the communication overhead. This paper presents an up-to-date performance evaluation of two publicly available UPC collective implementations on three scenarios: shared, distributed, and hybrid shared/distributed memory architectures. The characterization of the throughput of collective primitives is useful for increasing performance through the runtime selection of the appropriate primitive implementation, which depends on the message size and the memory architecture, as well as to detect inefficient implementations. In fact, based on the analysis of the UPC collectives performance, we proposed some optimizations for the current UPC collective libraries. We have also compared the performance of the UPC collective primitives and their MPI counterparts, showing that there is room for improvement. Finally, this paper concludes with an analysis of the influence of the performance of the UPC collectives on a representative communication-intensive application, showing that their optimization is highly important for UPC scalability.

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models | 2009

UPC performance evaluation on a multicore system

Damián A. Mallón; Andrés Gómez; José Carlos Mouriño; Guillermo L. Taboada; Carlos Teijeiro; Juan Touriño; Basilio B. Fraguela; Ramón Doallo; Brian Wibecan

As size and architectural complexity of High Performance Computing systems increases, the need for productive programming tools and languages becomes more important. The UPC language aims to be a good choice for a productive parallel programming. However, productivity is influenced not only by expressiveness of the language, but also by its performance. To assess the current UPC performance in high performance multicore systems, and therefore to help improve UPC developers future productivity, this paper provides an up-to-date UPC performance evaluation at various levels, evaluating two collective implementations, comparing their results with their MPI counterparts, and finally evaluating UPC and MPI performance in computational kernels. This analysis shows a path to optimize UPC collectives performance. This work also provides a performance snapshot of UPC vs the currently most popular choice for parallel programming, MPI. This snapshot, altogether with the UPC collectives analysis, shows that there is room for improvement and, besides its worse performance, UPC is suitable for a productive development of most HPC applications.

Journal of Computer Science and Technology | 2013

Design and Implementation of an Extended Collectives Library for Unified Parallel C

Carlos Teijeiro; Guillermo L. Taboada; Juan Touriño; Ramón Doallo; José Carlos Mouriño; Damián A. Mallón; Brian Wibecan

Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.

ieee international conference on high performance computing data and analytics | 2001

Parallelization of the STEM-II Air Quality Model

José Carlos Mouriño; David E. Singh; María J. Martín; J. M. Eiroa; Francisco F. Rivera; Ramón Doallo; Javier D. Bruguera

STEM-II is an Eulerian numerical model to simulate the behavior of pollutant factors in the air. In this paper the computational requirements of the program in terms of memory storage and execution times are analyzed. The results of this analysis are conclusive as regards to the need of using parallel processing to achieve reasonable execution times. Then, the improvements achieved after the parallelization of the code on a distributed memory multiprocessor using the MPI standard message passing library are shown.

grid computing | 2005

Modeling execution time of selected computation and communication kernels on grids

Marcos Boullón; José Carlos Cabaleiro; Ramón Doallo; Patricia González; Diego Martínez; María J. Martín; José Carlos Mouriño; Tomás F. Pena; Francisco F. Rivera

This paper introduces a methodology to model the execution time of several computation and communication routines developed in the frame of the CrossGrid project. The purpose of the methodology is to provide performance information about some selected computational kernels when they are executed in a grid. The models are based on analytical expressions obtained from exhaustive monitorized measurements. Even though the kernels that are considered in this work include both applications dependent and general purpose, the methodology can be applied to any kind of kernel in which the most relevant part in terms of execution time is due to computations and/or communications. We focused on MPI-based communications. In addition, an interactive Graphical User Interface was developed to summarize and show the information provided by the models from different views.

parallel computing | 2002

A Cluster-Based Solution for a High Performance Air Quality Simulation

José Carlos Mouriño; Patricia González; María J. Martín; Ramón Doallo

Parallel computing on networks of workstations or PCs has been gaining popularity in recent years due to their competitive cost-price compared to supercomputers. In this paper we provide a cluster-based solution for a high performance air quality model, the STEM-II program. The application has been parallelized using standard MPI library on a cluster of PCs. Performance results are shown in this paper. We also compare these results with those ones obtained on a multicomputer. Speedup and scalability, as well as the ratio communication/computation cost, on both systems, are evaluated.

international parallel and distributed processing symposium | 2003

Increasing the throughput of available resources using management tools based on Grid technologies

Patricia González; María J. Martín; José Carlos Mouriño; Ramón Doallo

The aim of this work is to build a dynamic virtual high performance platform made up of the heterogeneous individual machines available in a given instant of time in a work-center. In order to achieve this objective a low-intrusive monitor and scheduler system layered on top of an already existing Grid technology is proposed. The monitoring tool indicates available/unavailable nodes to be added/removed to/from the virtual platform. The scheduler manages the distribution of jobs among the available resources. The final goal is to increase the computational capacity of the work-center in a way that is both simple and costless for the users. Results obtained using the proposed tools to run two different applications, an image modeling application and an air quality model simulation, prove the feasibility and efficiency of our proposal.

Computación de altas prestaciones: actas de las XV Jornadas de Paralelismo, Almería, 15, 16 y 17 de septiembre de 2004, 2004, ISBN 84-8240-714-7, págs. 138-143 | 2004