Ramón Doallo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ramón Doallo is active.

Explore More

Publication

Featured researches published by Ramón Doallo.

Nature Methods | 2012

jModelTest 2: more models, new heuristics and parallel computing

Diego Darriba; Guillermo L. Taboada; Ramón Doallo; David Posada

jModelTest 2: more models, new heuristics and parallel computing Diego Darriba, Guillermo L. Taboada, Ramón Doallo and David Posada Supplementary Table 1. New features in jModelTest 2 Supplementary Table 2. Model selection accuracy Supplementary Table 3. Mean square errors for model averaged estimates Supplementary Note 1. Hill-climbing hierarchical clustering algorithm Supplementary Note 2. Heuristic filtering Supplementary Note 3. Simulations from prior distributions Supplementary Note 4. Speed-up benchmark on real and simulated datasets

Bioinformatics | 2011

ProtTest 3

Diego Darriba; Guillermo L. Taboada; Ramón Doallo; David Posada

UNLABELLED We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. AVAILABILITY ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uvigo.es/software/prottest3, linked to a Mercurial repository at Bitbucket (https://bitbucket.org/). CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

IEEE Computer | 1996

Parallel programming with Polaris

William Blume; Ramón Doallo; Rudolf Eigenmann; John R. Grout; Jay Hoeflinger; Thomas R. Lawrence

Parallel programming tools are limited, making effective parallel programming difficult and cumbersome. Compilers that translate conventional sequential programs into parallel form would liberate programmers from the complexities of explicit, machine oriented parallel programming. The paper discusses parallel programming with Polaris, an experimental translator of conventional Fortran programs that target machines such as the Cray T3D.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2009

Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures

Damián A. Mallón; Guillermo L. Taboada; Carlos Teijeiro; Juan Touriño; Basilio B. Fraguela; Andrés Gómez; Ramón Doallo; J. Carlos Mouriño

The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI performance against Unified Parallel C (UPC) and OpenMP on multicore architectures. From the analysis of the results, it can be concluded that MPI is generally the best choice on multicore systems with both shared and hybrid shared/distributed memory, as it takes the highest advantage of data locality, the key factor for performance in these systems. Regarding UPC, although it exploits efficiently the data layout in memory, it suffers from remote shared memory accesses, whereas OpenMP usually lacks efficient data locality support and is restricted to shared memory systems, which limits its scalability.

european conference on parallel processing | 2010

ProtTest-HPC: fast selection of best-fit models of protein evolution

Diego Darriba; Guillermo L. Taboada; Ramón Doallo; David Posada

The use of probabilistic models of amino acid replacement is essential for the study of protein evolution, and programs like ProtTest implement different strategies to identify the best-fit model for the data at hand. For large protein alignments, this task can demand vast computational resources, preventing the justification of the model used in the analysis. We have implemented a High Performance Computing (HPC) version of ProtTest. ProtTest-HPC can be executed in parallel in HPC environments as: (1) a GUI-based desktop version that uses multi-core processors and (2) a cluster-based version that distributes the computational load among nodes. The use of ProtTest-HPC resulted in significant performance gains, with speedups of up to 50 on a high performance cluster.

Science of Computer Programming | 2013

Java in the High Performance Computing arena: Research, practice and experience

Guillermo L. Taboada; Sabela Ramos; Roberto R. Expósito; Juan Touriño; Ramón Doallo

The rising interest in Java for High Performance Computing (HPC) is based on the appealing features of this language for programming multi-core cluster architectures, particularly the built-in networking and multithreading support, and the continuous increase in Java Virtual Machine (JVM) performance. However, its adoption in this area is being delayed by the lack of analysis of the existing programming options in Java for HPC and thorough and up-to-date evaluations of their performance, as well as the unawareness on current research projects in this field, whose solutions are needed in order to boost the embracement of Java in HPC. This paper analyzes the current state of Java for HPC, both for shared and distributed memory programming, presents related research projects, and finally, evaluates the performance of current Java HPC solutions and research developments on two shared memory environments and two InfiniBand multi-core clusters. The main conclusions are that: (1) the significant interest in Java for HPC has led to the development of numerous projects, although usually quite modest, which may have prevented a higher development of Java in this field; (2) Java can achieve almost similar performance to natively compiled languages, both for sequential and parallel applications, being an alternative for HPC programming; (3) the recent advances in the efficient support of Java communications on shared memory and low-latency networks are bridging the gap between Java and natively compiled applications in HPC. Thus, the good prospects of Java in this area are attracting the attention of both industry and academia, which can take significant advantage of Java adoption in HPC.

Future Generation Computer Systems | 2013

Performance analysis of HPC applications in the cloud

Roberto R. Expósito; Guillermo L. Taboada; Sabela Ramos; Juan Touriño; Ramón Doallo

The scalability of High Performance Computing (HPC) applications depends heavily on the efficient support of network communications in virtualized environments. However, Infrastructure as a Service (IaaS) providers are more focused on deploying systems with higher computational power interconnected via high-speed networks rather than improving the scalability of the communication middleware. This paper analyzes the main performance bottlenecks in HPC application scalability on the Amazon EC2 Cluster Compute platform: (1) evaluating the communication performance on shared memory and a virtualized 10 Gigabit Ethernet network; (2) assessing the scalability of representative HPC codes, the NAS Parallel Benchmarks, using an important number of cores, up to 512; (3) analyzing the new cluster instances (CC2), both in terms of single instance performance, scalability and cost-efficiency of its use; (4) suggesting techniques for reducing the impact of the virtualization overhead in the scalability of communication-intensive HPC codes, such as the direct access of the Virtual Machine to the network and reducing the number of processes per instance; and (5) proposing the combination of message-passing with multithreading as the most scalable and cost-effective option for running HPC applications on the Amazon EC2 Cluster Compute platform. Highlights? Performance results of HPC applications in the cloud using up to 512 cores. ? Up-to-date performance evaluation of the Amazon EC2 Cluster Compute platform. ? High Performance Cloud Computing applications rely on scalable communication. ? Proposal of new techniques for increasing scalability of HPC codes in the cloud. ? Using several levels of parallelism is key for HPC scalability in the cloud.

international conference on parallel architectures and compilation techniques | 1999

Automatic analytical modeling for the estimation of cache misses

Basilio B. Fraguela; Ramón Doallo; Emilio L. Zapata

Caches play a very important role in the performance of modern computer systems due to the gap between the memory and the processor speed. Among the methods for studying their behaviour, the most widely used has been trace-driven simulation. Nevertheless, analytical modeling gives more information and requires smaller computation times that allow it to be used in the compilation step to drive automatic optimizations on the code. The traditional drawback of analytical modeling has been its limited precision and the lack of techniques to apply it systematically without user intervention. In this work we present a methodology to build analytical models for codes with regular access patterns. These models can be applied to caches with an arbitrary size, line size and associativity. Their validation through simulations using typical scientific code fragments has proved a good degree of accuracy.

IEEE Transactions on Communications | 1997

High-performance VLSI architecture for the Viterbi algorithm

Montserrat Bóo; Francisco Argüello; Javier D. Bruguera; Ramón Doallo; Emilio L. Zapata

The Viterbi (1967) algorithm (VA) is known to be an efficient method for the realization of maximum-likelihood (ML) decoding of convolutional codes. The VA is characterized by a graph, called a trellis, which defines the transitions between states. To define an area efficient architecture for the VA is equivalent to obtaining an efficient mapping of the trellis. We present a methodology that permits the efficient hardware mapping of the VA onto a processor network of arbitrary size. This formal model is employed for the partitioning of the computations among an arbitrary number of processors in such a way that the data are recirculated, optimizing the use of the PEs and the communications. Therefore, the algorithm is mapped onto a column of processing elements and an optimal design solution is obtained for a particular set of area and/or speed constraints. Furthermore, the management of the surviving path memory for its mapping and distribution among the processors was studied. As a result, we obtain a regular and modular design appropriate for its VLSI implementation in which the only necessary communications between processors are the data recirculations between stages.

international symposium on microarchitecture | 2009

Adaptive line placement with the set balancing cache

Dyer Rolán; Basilio B. Fraguela; Ramón Doallo

Efficient memory hierarchy design is critical due to the increasing gap between the speed of the processors and the memory. One of the sources of inefficiency in current caches is the non-uniform distribution of the memory accesses on the cache sets. Its consequence is that while some cache sets may have working sets that are far from fitting in them, other sets may be underutilized because their working set has fewer lines than the set. In this paper we present a technique that aims to balance the pressure on the cache sets by detecting when it may be beneficial to associate sets, displacing lines from stressed sets to underutilized ones. This new technique, called set balancing cache or SBC, achieved an average reduction of 13% in the miss rate often benchmarks from the SPEC CPU2006 suite, resulting in an average IPC improvement of 5%.

Explore More