María J. Martín | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where María J. Martín is active.

Explore More

Publication

Featured researches published by María J. Martín.

parallel computing | 2003

High performance air pollution modeling for a power plant environment

María J. Martín; David E. Singh; J. Carlos Mouriño; Francisco F. Rivera; Ramón Doallo; Javier D. Bruguera

The aim of this work is to provide a high performance air quality simulation using the STEM-II (Sulphur Transport Eulerian Model 2) program, a large-scale pollution modeling application. First, we optimize the sequential program with the aim of increasing data locality. Then, we parallelized the program using OpenMP directives for shared memory systems, and the MPI library for distributed memory machines. Performance results are presented for a SGI O2000 multiproccessor, a Fujitsu AP3000 multicomputer and a Cluster of PCs. Experimental results show that the parallel versions of the code achieve important reductions in the CPU time needed by each simulation. This will allow us to obtain results with adequate speed and reliability for the industrial environment where it is intended to be applied.

Concurrency and Computation: Practice and Experience | 2010

CPPC: a compiler-assisted tool for portable checkpointing of message-passing applications: CPPC: COMPILER-ASSISTED PORTABLE CHECKPOINTING

Gabriel Rodríguez; María J. Martín; Patricia González; Juan Touriño; Ramón Doallo

With the evolution of high‐performance computing toward heterogeneous, massively parallel systems, parallel applications have developed new checkpoint and restart necessities. Whether due to a failure in the execution or to a migration of the application processes to different machines, checkpointing tools must be able to operate in heterogeneous environments. However, some of the data manipulated by a parallel application are not truly portable. Examples of these include opaque state (e.g. data structures for communications support) or diversity of interfaces for a single feature (e.g. communications, I/O). Directly manipulating the underlying ad hoc representations renders checkpointing tools unable to work on different environments. Portable checkpointers usually work around portability issues at the cost of transparency: the user must provide information such as what data need to be stored, where to store them, or where to checkpoint. CPPC (ComPiler for Portable Checkpointing) is a checkpointing tool designed to feature both portability and transparency. It is made up of a library and a compiler. The CPPC library contains routines for variable level checkpointing, using portable code and protocols. The CPPC compiler helps to achieve transparency by relieving the user from time‐consuming tasks, such as data flow and communications analyses and adding instrumentation code. This paper covers both the operation of the CPPC library and its compiler support. Experimental results using benchmarks and large‐scale real applications are included, demonstrating usability, efficiency, and portability. Copyright

international parallel and distributed processing symposium | 2010

Servet: A benchmark suite for autotuning on multicore clusters

Jorge González-Domínguez; Guillermo L. Taboada; Basilio B. Fraguela; María J. Martín; Juan Touriño

The growing complexity in computer system hierarchies due to the increase in the number of cores per processor, levels of cache (some of them shared) and the number of processors per node, as well as the high-speed interconnects, demands the use of new optimization techniques and libraries that take advantage of their features. In this paper Servet, a suite of benchmarks focused on detecting a set of parameters with high influence in the overall performance of multicore systems, is presented. These benchmarks are able to detect the cache hierarchy, including their size and which caches are shared by each core, bandwidths and bottlenecks in memory accesses, as well as communication latencies among cores. These parameters can be used by auto-tuned codes to increase their performance in multicore clusters. Experimental results using different representative systems show that Servet provides very accurate estimates of the parameters of the machine architecture.

IEEE Transactions on Education | 2005

A grid portal for an undergraduate parallel programming course

Juan Touriño; María J. Martín; Jacobo Tarrío; Manuel Arenaz

This paper describes an experience of designing and implementing a portal to support transparent remote access to supercomputing facilities to students enrolled in an undergraduate parallel programming course. As these facilities are heterogeneous, are located at different sites, and belong to different institutions, grid computing technologies have been used to overcome these issues. The result is a grid portal based on a modular and easily extensible software architecture that provides a uniform and user-friendly interface for students to work on their programming laboratory assignments.

IEICE Transactions on Information and Systems | 2006

Controller/Precompiler for Portable Checkpointing

Gabriel Rodríguez; María J. Martín; Patricia González; Juan Touriño

This paper presents CPPC (Controller/Precompiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and Grid infrastructures through the use of portable protocols, portable checkpoint files and portable code. It works at variable level being user-directed, thus generating small checkpoint files. It allows parallel processes to checkpoint independently, without runtime coordination or message-logging. Consistency is achieved at restart time by negotiating the restart point. A directive-based checkpointing precompiler has also been implemented to ease up users effort. CPPC was designed to work with parallel MPI programs, though it can be used with sequential ones, and easily extended to parallel programs written using different message-passing libraries, due to its highly modular design. Experimental results are shown using CPPC with different test applications.

New Generation Computing | 2013

Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes

Iván Cores; Gabriel Rodríguez; María J. Martín; Patricia González; Roberto R. Osorio

The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures to ensure that not all computation done is lost on machine failures. Checkpointing and rollback recovery is one of the most popular techniques to implement fault-tolerant applications. However, checkpointing parallel applications is expensive in terms of computing time, network utilization and storage resources. Thus, current checkpoint-recovery techniques should minimize these costs in order to be useful for large scale systems. In this paper three different and complementary techniques to reduce the size of the checkpoints generated by application-level checkpointing are proposed and implemented. Detailed experimental results obtained on a multicore cluster show the effectiveness of the proposed methods to reduce checkpointing cost.

international conference on parallel processing | 2002

Exploiting locality in the run-time parallelization of irregular loops

María J. Martín; David E. Singh; Juan Touriño; Francisco F. Rivera

The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed In this paper we propose runtime strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, the overall program speedup. Two strategies are proposed one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.

international conference on parallel processing | 2001

The STEM-II air quality model on a distributed memory system

José Carlos Mouriño; María J. Martín; Ramón Doallo; David E. Singh; Francisco F. Rivera; Javier D. Bruguera

STEM-II is an Eulerian air quality model which simulates transport, chemical transformations, emissions and depositions processes in an integrated framework. The model is computationally intensive because the governing equations are non-linear, highly coupled and stiff. The purpose of this work is the reduction of CPU time needed by each simulation by means of the parallel implementation of the code to obtain real-time predictions. The improvements achieved on distributed memory systems using the MPI library are shown.

Concurrency and Computation: Practice and Experience | 2012

UPCBLAS: a library for parallel matrix computations in Unified Parallel C

Jorge González-Domínguez; María J. Martín; Guillermo L. Taboada; Juan Touriño; Ramón Doallo; Damián A. Mallón; Brian Wibecan

The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UPCBLAS, a parallel numerical library for dense matrix computations using the PGAS Unified Parallel C language. The routines developed in UPCBLAS are built on top of sequential basic linear algebra subprograms functions and exploit the particularities of the PGAS paradigm, taking into account data locality in order to achieve a good performance. Furthermore, the routines implement other optimization techniques, several of them by automatically taking into account the hardware characteristics of the underlying systems on which they are executed. The library has been experimentally evaluated on a multicore supercomputer and compared with a message‐passing‐based parallel numerical library, demonstrating good scalability and efficiency. Copyright

The Computer Journal | 2011

Analysis of Performance-impacting Factors on Checkpointing Frameworks

Gabriel Rodríguez; María J. Martín; Patricia González; Juan Touriño

This paper focuses on the performance evaluation of Compiler for Portable Checkpointing (CPPC), a tool for the checkpointing of parallel message-passing applications. Its performance and the factors that impact it are transparently and rigorously identified and assessed. The tests were performed on a public supercomputing infrastructure, using a large number of very different applications and showing excellent results in terms of performance and effort required for integration into user codes. Statistical analysis techniques have been used to better approximate the performance of the tool. Quantitative and qualitative comparisons with other rollback-recovery approaches to fault tolerance are also included. All these data and comparisons are then discussed in an effort to extract meaningful conclusions about the state-of-the-art and future research trends in the rollback-recovery field.

Explore More