Juan Touriño | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juan Touriño is active.

Explore More

Publication

Featured researches published by Juan Touriño.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2009

Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures

Damián A. Mallón; Guillermo L. Taboada; Carlos Teijeiro; Juan Touriño; Basilio B. Fraguela; Andrés Gómez; Ramón Doallo; J. Carlos Mouriño

The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI performance against Unified Parallel C (UPC) and OpenMP on multicore architectures. From the analysis of the results, it can be concluded that MPI is generally the best choice on multicore systems with both shared and hybrid shared/distributed memory, as it takes the highest advantage of data locality, the key factor for performance in these systems. Regarding UPC, although it exploits efficiently the data layout in memory, it suffers from remote shared memory accesses, whereas OpenMP usually lacks efficient data locality support and is restricted to shared memory systems, which limits its scalability.

Science of Computer Programming | 2013

Java in the High Performance Computing arena: Research, practice and experience

Guillermo L. Taboada; Sabela Ramos; Roberto R. Expósito; Juan Touriño; Ramón Doallo

The rising interest in Java for High Performance Computing (HPC) is based on the appealing features of this language for programming multi-core cluster architectures, particularly the built-in networking and multithreading support, and the continuous increase in Java Virtual Machine (JVM) performance. However, its adoption in this area is being delayed by the lack of analysis of the existing programming options in Java for HPC and thorough and up-to-date evaluations of their performance, as well as the unawareness on current research projects in this field, whose solutions are needed in order to boost the embracement of Java in HPC. This paper analyzes the current state of Java for HPC, both for shared and distributed memory programming, presents related research projects, and finally, evaluates the performance of current Java HPC solutions and research developments on two shared memory environments and two InfiniBand multi-core clusters. The main conclusions are that: (1) the significant interest in Java for HPC has led to the development of numerous projects, although usually quite modest, which may have prevented a higher development of Java in this field; (2) Java can achieve almost similar performance to natively compiled languages, both for sequential and parallel applications, being an alternative for HPC programming; (3) the recent advances in the efficient support of Java communications on shared memory and low-latency networks are bridging the gap between Java and natively compiled applications in HPC. Thus, the good prospects of Java in this area are attracting the attention of both industry and academia, which can take significant advantage of Java adoption in HPC.

Future Generation Computer Systems | 2013

Performance analysis of HPC applications in the cloud

Roberto R. Expósito; Guillermo L. Taboada; Sabela Ramos; Juan Touriño; Ramón Doallo

The scalability of High Performance Computing (HPC) applications depends heavily on the efficient support of network communications in virtualized environments. However, Infrastructure as a Service (IaaS) providers are more focused on deploying systems with higher computational power interconnected via high-speed networks rather than improving the scalability of the communication middleware. This paper analyzes the main performance bottlenecks in HPC application scalability on the Amazon EC2 Cluster Compute platform: (1) evaluating the communication performance on shared memory and a virtualized 10 Gigabit Ethernet network; (2) assessing the scalability of representative HPC codes, the NAS Parallel Benchmarks, using an important number of cores, up to 512; (3) analyzing the new cluster instances (CC2), both in terms of single instance performance, scalability and cost-efficiency of its use; (4) suggesting techniques for reducing the impact of the virtualization overhead in the scalability of communication-intensive HPC codes, such as the direct access of the Virtual Machine to the network and reducing the number of processes per instance; and (5) proposing the combination of message-passing with multithreading as the most scalable and cost-effective option for running HPC applications on the Amazon EC2 Cluster Compute platform. Highlights? Performance results of HPC applications in the cloud using up to 512 cores. ? Up-to-date performance evaluation of the Amazon EC2 Cluster Compute platform. ? High Performance Cloud Computing applications rely on scalable communication. ? Proposal of new techniques for increasing scalability of HPC codes in the cloud. ? Using several levels of parallelism is key for HPC scalability in the cloud.

principles and practice of programming in java | 2009

Java for high performance computing: assessment of current research and practice

Guillermo L. Taboada; Juan Touriño; Ramón Doallo

The rising interest in Java for High Performance Computing (HPC) is based on the appealing features of this language for programming multi-core cluster architectures, particularly the built-in networking and multithreading support, and the continuous increase in Java Virtual Machine (JVM) performance. However, its adoption in this area is being delayed by the lack of analysis of the existing programming options in Java for HPC and evaluations of their performance, as well as the unawareness of the current research projects in this field, whose solutions are needed in order to boost the embracement of Java in HPC. This paper analyzes the current state of Java for HPC, both for shared and distributed memory programming, presents related research projects, and finally, evaluates the performance of current Java HPC solutions and research developments on a multi-core cluster with a high-speed network, InfiniBand, and a 24-core shared memory machine. The main conclusions are that: (1) the significant interest on Java for HPC has led to the development of numerous projects, although usually quite modest, which may have prevented a higher development of Java in this field; and (2) Java can achieve almost similar performance to native languages, both for sequential and parallel applications, being an alternative for HPC programming. Thus, the good prospects of Java in this area are attracting the attention of both industry and academia, which can take significant advantage of Java adoption in HPC.

ACM Transactions on Programming Languages and Systems | 2008

XARK: An extensible framework for automatic recognition of computational kernels

Manuel Arenaz; Juan Touriño; Ramón Doallo

The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90s. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.

International Journal of Geographical Information Science | 2003

Research Article: A GIS-embedded system to support land consolidation plans in Galicia

Juan Touriño; Jorge Parapar; Ramón Doallo; Marcos Boullón; Francisco F. Rivera; Javier D. Bruguera; Xesús P. González; Rafael Crecente; Carlos J. Álvarez

Land consolidation is a strategic instrument for rural planning and thus economic development in the Spanish region of Galicia. This paper describes an experimental system embedded in a GIS environment to aid rural engineers to develop land consolidation plans. The system supports all the stages of the plan and many functionalities are implemented as heuristic processes based on expert knowledge and advice. The overall aim is to overcome administrative and technical problems of traditional consolidation procedures. The system provides an integrated framework for the management of spatial and administrative consolidation information. It also includes optimization-based algorithms for the automated generation of multiple alternative parcel reallocations, as well as an environment to refine and objectively evaluate the proposed solutions. These key capabilities result in a powerful tool for decision making that dramatically reduces the time and cost of land consolidation plans. Pilot experiences in two consolidation zones of Galicia assess the feasibility and effectiveness of the system.

Concurrency and Computation: Practice and Experience | 2010

CPPC: a compiler-assisted tool for portable checkpointing of message-passing applications: CPPC: COMPILER-ASSISTED PORTABLE CHECKPOINTING

Gabriel Rodríguez; María J. Martín; Patricia González; Juan Touriño; Ramón Doallo

With the evolution of high‐performance computing toward heterogeneous, massively parallel systems, parallel applications have developed new checkpoint and restart necessities. Whether due to a failure in the execution or to a migration of the application processes to different machines, checkpointing tools must be able to operate in heterogeneous environments. However, some of the data manipulated by a parallel application are not truly portable. Examples of these include opaque state (e.g. data structures for communications support) or diversity of interfaces for a single feature (e.g. communications, I/O). Directly manipulating the underlying ad hoc representations renders checkpointing tools unable to work on different environments. Portable checkpointers usually work around portability issues at the cost of transparency: the user must provide information such as what data need to be stored, where to store them, or where to checkpoint. CPPC (ComPiler for Portable Checkpointing) is a checkpointing tool designed to feature both portability and transparency. It is made up of a library and a compiler. The CPPC library contains routines for variable level checkpointing, using portable code and protocols. The CPPC compiler helps to achieve transparency by relieving the user from time‐consuming tasks, such as data flow and communications analyses and adding instrumentation code. This paper covers both the operation of the CPPC library and its compiler support. Experimental results using benchmarks and large‐scale real applications are included, demonstrating usability, efficiency, and portability. Copyright

parallel, distributed and network-based processing | 2009

NPB-MPJ: NAS Parallel Benchmarks Implementation for Message-Passing in Java

Damián A. Mallón; Guillermo L. Taboada; Juan Touriño; Ramón Doallo

Java is a valuable and emerging alternative for the development of parallel applications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of both shared and distributed memory programming is an interesting option for parallel programming multi-core systems. However, the concerns about Java performance are hindering its adoption in this field, although it is difficult to evaluate accurately its performance due to the lack of standard benchmarks in Java.This paper presents NPB-MPJ, the first extensive implementation of the NAS Parallel Benchmarks (NPB), the standard parallel benchmark suite, for Message-Passing in Java (MPJ) libraries. Together with the design and implementation details of NPB-MPJ, this paper gathers several optimization techniques that can serve as a guide for the development of more efficient Java applications for High Performance Computing (HPC). NPB-MPJ has been used in the performance evaluation of Java against C/Fortran parallel libraries on two representative multi-core clusters. Thus, NPB-MPJ provides an up-to-date snapshot of MPJ performance, whose comparative analysis of current Java and native parallel solutions confirms that MPJ is an alternative for parallel programming multi-core systems.

ieee international conference on high performance computing data and analytics | 2012

Communication avoiding and overlapping for numerical linear algebra

Evangelos Georganas; Jorge González-Domínguez; Edgar Solomonik; Yili Zheng; Juan Touriño; Katherine A. Yelick

To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing inter-processor data transfer volume at the cost of extra memory usage. Communication overlap attempts to hide messaging latency by pipelining messages and overlapping with computational work. We study the interaction and compatibility of these two techniques for two matrix multiplication algorithms (Cannon and SUMMA), triangular solve, and Cholesky factorization. For each algorithm, we construct a detailed performance model that considers both critical path dependencies and idle time. We give novel implementations of 2.5D algorithms with overlap for each of these problems. Our software employs UPC, a partitioned global address space (PGAS) language that provides fast one-sided communication. We show communication avoidance and overlap provide a cumulative benefit as core counts scale, including results using over 24K cores of a Cray XE6 system.

international parallel and distributed processing symposium | 2010

Servet: A benchmark suite for autotuning on multicore clusters

Jorge González-Domínguez; Guillermo L. Taboada; Basilio B. Fraguela; María J. Martín; Juan Touriño

The growing complexity in computer system hierarchies due to the increase in the number of cores per processor, levels of cache (some of them shared) and the number of processors per node, as well as the high-speed interconnects, demands the use of new optimization techniques and libraries that take advantage of their features. In this paper Servet, a suite of benchmarks focused on detecting a set of parameters with high influence in the overall performance of multicore systems, is presented. These benchmarks are able to detect the cache hierarchy, including their size and which caches are shared by each core, bandwidths and bottlenecks in memory accesses, as well as communication latencies among cores. These parameters can be used by auto-tuned codes to increase their performance in multicore clusters. Experimental results using different representative systems show that Servet provides very accurate estimates of the parameters of the machine architecture.

Explore More