Oscar Torreno
University of Málaga
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Oscar Torreno.
international symposium on parallel and distributed processing and applications | 2012
Johan Karlsson; Oscar Torreno; Daniel Ramet; Gunter Klambauer; M. Cano; Oswaldo Trelles
The petabyte scale of the Big Data generation in bioinformatics requires the introduction of advanced computational techniques to enable efficient knowledge discovery from data. Many data analysis tools in bioinformatics have been developed but few have been adapted to take advantage of high performance computing (HPC) resources. For some of these tools, an attractive option is to employ a map/reduce strategy. On the other hand, Cloud Computing could be an important platform to run such tools in parallel because it provides on-demand, elastic computational resources. This paper presents a software suite for Microsoft Azure which supports legacy software (without modifications of the algorithm). We demonstrate the feasibility of the approach by benchmarking a typical bioinformatics tool, namely dotplot.
BMC Bioinformatics | 2015
Oscar Torreno; Oswaldo Trelles
BackgroundConventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community.ResultsWe have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods.ConclusionsWe have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively.
BMC Genomics | 2016
Esteban Pérez-Wohlfeil; Jose A. Arjona-Medina; Oscar Torreno; Eugenia Ulzurrun; Oswaldo Trelles
BackgroundThe field of metagenomics, defined as the direct genetic analysis of uncultured samples of genomes contained within an environmental sample, is gaining increasing popularity. The aim of studies of metagenomics is to determine the species present in an environmental community and identify changes in the abundance of species under different conditions. Current metagenomic analysis software faces bottlenecks due to the high computational load required to analyze complex samples.ResultsA computational open-source workflow has been developed for the detailed analysis of metagenomes. This workflow provides new tools and datafile specifications that facilitate the identification of differences in abundance of reads assigned to taxa (mapping), enables the detection of reads of low-abundance bacteria (producing evidence of their presence), provides new concepts for filtering spurious matches, etc. Innovative visualization ideas for improved display of metagenomic diversity are also proposed to better understand how reads are mapped to taxa. Illustrative examples are provided based on the study of two collections of metagenomes from faecal microbial communities of adult female monozygotic and dizygotic twin pairs concordant for leanness or obesity and their mothers.ConclusionsThe proposed workflow provides an open environment that offers the opportunity to perform the mapping process using different reference databases. Additionally, this workflow shows the specifications of the mapping process and datafile formats to facilitate the development of new plugins for further post-processing. This open and extensible platform has been designed with the aim of enabling in-depth analysis of metagenomic samples and better understanding of the underlying biological processes.
international conference on cloud computing and services science | 2014
Paul Heinzlreiter; James R. Perkins; Oscar Torreno; Johan Karlsson; Juan Antonio Ranea; Andreas Mitterecker; Miguel Blanca; Oswaldo Trelles
The cost of obtaining genome-scale biomedical data continues to drop rapidly, with many hospitals and universities being able to produce large amounts of data. Managing and analysing such ever-growing datasets is becoming a crucial issue. Cloud computing presents a good solution to this problem due to its flexibility in obtaining computational resources. However, it is essential to allow end-users with no experience to take advantage of the cloud computing model of elastic resource provisioning. This paper presents a workflow that allows the end user to perform the core steps of a genome wide association analysis, consisting of 1) uploading raw data files to the cloud, 2) genotype calling, i.e. converting the raw data into information on genome variation 3) quality assessment of the genotype data and subsequent filtering based on user-provided parameters and 4) downloading the filtered genotype data in a standard file format. A number of steps in this process are computationally intensive; moreover, the computational resources involved vary greatly depending on the size of the study, from a few samples to a few thousand. Therefore cloud computing provides an ideal solution to this problem. The paper describes in detail how the pipeline was implemented, focussing on the cloud infrastructure and the different tools and software involved for software management and GUI construction. The key contributions of this paper are: 1a) iIt presents a real world application of cloud-computing to address a critical problem in biomedicine, . 2b) iIt provides a thorough description of how such a pipeline was implemented, in terms of data management and user-interface, such that the end-user does not need to focus on the computational aspect but can instead concentrate on data analysis and biological interpretation of results, and. c3) wWe show how cloud-computing can be used in a more effective way through the parallelisation of the appropriate parts of the pipeline.
international conference on conceptual structures | 2015
Oscar Torreno; Michael T. Krieger; Paul Heinzlreiter; Oswaldo Trelles
Abstract Workflows are becoming the new paradigm in bioinformatics. In general, bioinformatics problems are solved by interconnecting several small software pieces to perform complex analyses. This demands certain expertise to create, enact and monitor such tools compositions. In addition bioinformatics is immersed in the big-data territory, facing huge problems to analyse such amount of data. We have addressed these problems by integrating a tools management platform (Galaxy) and a Cloud infrastructure, which prevents moving the big datasets between different locations and allows the dynamic scaling of the computing resources depending on the user needs. The result is a user-friendly platform that facilitates the work of the end-users while performing their experiments, installed in a Cloud environment that includes authentication, security and big-data transfer mechanisms. To demonstrate the suitability of our approach we have integrated in the infrastructure an existing pairwise and multiple genome comparison tool which comprises the management of huge datasets and high computational demands.
international conference on bioinformatics and biomedical engineering | 2017
Esteban Pérez-Wohlfeil; Oscar Torreno; Oswaldo Trelles
Traditional comparisons between metagenomes are often performed using reference databases as intermediary templates from which to obtain distance metrics. However, in order to fully exploit the potential of the information contained within metagenomes, it becomes of interest to remove any intermediate agent that is prone to introduce errors or biased results. In this work, we perform an analysis over the state of the art methods and deduce that it is necessary to employ fine-grained methods in order to assess similarity between metagenomes. In addition, we propose our developed method for accurate and fast matching of reads.
Cluster Computing | 2017
Oscar Torreno; Oswaldo Trelles
Genome comparison poses important computational challenges, especially in CPU-time, memory allocation and I/O operations. Although there already exist parallel approaches of multiple sequence comparisons algorithms, they face a significant limitation on the input sequence length. GECKO appeared as a computational and memory efficient method to overcome such limitation. However, its performance could be greatly increased by applying parallel strategies and I/O optimisations. We have applied two different strategies to accelerate GECKO while producing the same results. First, a two-level parallel approach parallelising each independent internal pairwise comparison in the first level, and the GECKO modules in the second level. A second approach consists on a complete rewrite of the original code to reduce I/O. Both strategies outperform the original code, which was already faster than equivalent software. Thus, much faster pairwise and multiple genome comparisons can be performed, what is really important with the ever-growing list of available genomes.
european conference on parallel processing | 2016
Oscar Torreno; Oswaldo Trelles
We present a two-level parallel strategy focused in the enhancement of GECKO software for multiple and pairwise genome comparisons. GECKO was developed to break the computational barriers on search space and memory demands faced by equivalent software. However, although being faster than equivalent software for comparing long sequences, its execution time attracted our interest to develop a parallel strategy. Additionally, the execution time is even higher in multiple genome comparisons where several independent pairwise comparisons are typically performed sequentially. After a careful study of the internal data dependencies of the GECKO modules, we noticed that most of them were subject to an easy and efficient parallelization. The result is a two-level parallel approach to accelerate multiple genome comparisons. The first level is aimed at parallelizing each independent pairwise genome comparison of a multiple comparison study to a different core. This level is application-independent, we are using GECKO but any other equivalent software can be used. The second level consists on the internal parallelization of GECKO modules with evident enhancements in performance while results remain invariant. After solving the problems of combining the big amount of I/O operations overlapped with computation, the obtained speedups reflect the good efficiency of the devised strategy.
international conference on algorithms and architectures for parallel processing | 2017
Esteban Pérez-Wohlfeil; Oscar Torreno; Oswaldo Trelles
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. Parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. These algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show that sequential optimizations yield up to 8\(\times \) speedup for scenarios with larger data.
Future Generation Computer Systems | 2017
Michael T. Krieger; Oscar Torreno; Oswaldo Trelles; Dieter Kranzlmüller