Is this you? Create Your Porfile

Friman Sánchez

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Friman Sánchez is active.

Explore More

Publication

Featured researches published by Friman Sánchez.

international symposium on microarchitecture | 2010

The SARC Architecture

Alex Ramirez; Felipe Cabarcas; Ben H. H. Juurlink; Mauricio Alvarez Mesa; Friman Sánchez; Arnaldo Azevedo; Cor Meenderinck; Catalin Bogdan Ciobanu; Sebastian Isaza; Gerogi Gaydadjiev

The SARC architecture is composed of multiple processor types and a set of user-managed direct memory access (DMA) engines that let the runtime scheduler overlap data transfer and computation. The runtime system automatically allocates tasks on the heterogeneous cores and schedules the data transfers through the DMA engines. SARCs programming model supports various highly parallel applications, with matching support from specialized accelerator processors.

european conference on parallel processing | 2010

Long DNA sequence comparison on multicore architectures

Friman Sánchez; Felipe Cabarcas; Alex Ramirez; Mateo Valero

Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the growth of biological databases, sequence comparison is becoming an important challenge for high performance computing, especially when very long sequences are compared. The Smith-Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). In this work, we show how long sequences comparison takes advantage of current and future multicore architectures. We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. We study the memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture is an valid alternative to execute challenging bioinformatic workloads.

ieee international symposium on workload characterization | 2005

Parallel processing in biological sequence comparison using general purpose processors

Friman Sánchez; Esther Salamí; Alex Ramirez; Mateo Valero

The comparison and alignment of DNA and protein sequences are important tasks in molecular biology and bioinformatics. One of the most well known algorithms to perform the string-matching operation present in these tasks is the Smith-Waterman algorithm (SW). However, it is a computation intensive algorithm, and many researchers have developed heuristic strategies to avoid using it, specially when using large databases to perform the search. There are several efficient implementations of the SW algorithm on general purpose processors. These implementations try to extract data-level parallelism taking advantage of single-instruction multiple-data extensions (SIMD), capable of performing several operations in parallel on a set of data. In this paper, we propose a more efficient data parallel implementation of the SW algorithm. Our proposed implementation obtains a 30% reduction in the execution time relative to the previous best data-parallel alternative. In this paper we review different alternative implementation of the SW algorithm, compare them with our proposal, and present preliminary results for some heuristic implementations. Finally, we present a detailed study of the computational complexity of the different alignment algorithms presented and their behavior on the different aspect of the CPU microarchitecture.

international symposium on performance analysis of systems and software | 2005

On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications

Friman Sánchez; Mauricio Alvarez; E. Salamf; Alex Ramirez; Mateo Valero

SIMD extensions are the most common technique used in current processors for multimedia computing. In order to obtain more performance for emerging applications SIMD extensions need to be scaled. In this paper we perform a scalability analysis of SIMD extensions for multimedia applications. Scaling a 1-dimensional extension, like Intel MMX, was compared to scaling a 2-dimensional (matrix) extension. Evaluations have demonstrated that the 2-d architecture is able to use more parallel hardware than the 1-d extension. Speed-ups over a 2-way superscalar processor with MMX-like extension go up to 4X for kernels and up to 3.3X for complete applications and the matrix architecture can deliver, in some cases, more performance with simpler processor configurations. The experiments also show that the scaled matrix architecture is reaching the limits of the DLP available in the internal loops of common multimedia kernelsSIMD extensions are the most common technique used in current processors for multimedia computing. In order to obtain more performance for emerging applications SIMD extensions need to be scaled. In this paper we perform a scalability analysis of SIMD extensions for multimedia applications. Scaling a 1-dimensional extension, like Intel MMX, was compared to scaling a 2-dimensional (matrix) extension. Evaluations have demonstrated that the 2-d architecture is able to use more parallel hardware than the 1-d extension. Speed-ups over a 2-way superscalar processor with MMX-like extension go up to 4X for kernels and up to 3.3X for complete applications and the matrix architecture can deliver, in some cases, more performance with simpler processor configurations. The experiments also show that the scaled matrix architecture is reaching the limits of the DLP available in the internal loops of common multimedia kernels.

Concurrency and Computation: Practice and Experience | 2011

Scalable multicore architectures for long DNA sequence comparison

Friman Sánchez; Felipe Cabarcas; Alex Ramirez; Mateo Valero

Biological sequence comparison is one of the most important tasks in Bioinformatics. Owing to the fast growth of databases that contain biological information, sequence comparison represents an important challenge for high‐performance computing, especially when very long sequences are compared, i.e. the complete genome of several organisms. The Smith–Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). Concurrently, there is a paradigm shift towards chip multiprocessors in computer architecture, which offer a huge amount of potential performance that can only be exploited efficiently if applications are effectively mapped and parallelized. In this work, we analyze how large‐scale biology sequence comparison takes advantage of the current and future multicore architectures. Our starting point is the performance analysis of the current multicore IBM Cell B.E. processor; we analyze two different SW implementations on the Cell B.E. Then, using simulation tools, we study the performance scalability when a many‐core architecture is used for performing long DNA sequence comparison. We investigate the efficient memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture can be an efficient alternative to execute challenging bioinformatic workloads. Copyright

complex, intelligent and software intensive systems | 2010

Scalability Analysis of Progressive Alignment on a Multicore

Sebastian Isaza; Friman Sánchez; Georgi Gaydadjiev; Alex Ramirez; Mateo Valero

Sequence alignment is a fundamental instrument in Bioinformatics. In recent years, numerous proposals have been addressing the problem of accelerating this class of applications. This, due to the rapid growth of sequence databases in combination with the high computational demands imposed by the algorithms. In this paper we focus on the analysis of the progressive alignment in ClustalW, a widely used program for performing multiple sequence alignment. We have parallelized ClustalW for the Cell processor architecture and have carefully analyzed the scalability of its different phases with both the number of cores used and the input size. Experimental results show that computing profile scores scales well up to 16 SPE cores. With the increase of the input size, profiles initialization in the PPE core becomes the predominant bottleneck.

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2008

Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications

Sebastian Isaza; Friman Sánchez; Georgi Gaydadjiev; Alex Ramirez; Mateo Valero

The fast growth of bioinformatics field has attracted the attention of computer scientists in the last few years. At the same time the increasing database sizes require greater efforts to improve the computational performance. From a computer architecture point of view, we intend to investigate how bioinformatics applications can benefit from future multi-core processors. In this paper we present a preliminary study of the Cell BE processor limitations when executing two representative sequence alignment applications (Ssearch and ClustalW). The inherent large parallelism of the targeted algorithms makes them ideal for architectures supporting multiple dimensions of parallelism (TLP and DLP). However, in the case of Cell BE we identified several architectural limitations that need a careful study and quantification.

computing frontiers | 2011

Parametrizing multicore architectures for multiple sequence alignment

Sebastian Isaza; Friman Sánchez; Felipe Cabarcas; Alex Ramirez; Georgi Gaydadjiev

Sequence alignment is one of the fundamental tasks in bioinformatics. Due to the exponential growth of biological data and the computational complexity of the algorithms used, high performance computing systems are required. Although multicore architectures have the potential of exploiting the task-level parallelism found in these workloads, efficiently harnessing systems with hundreds of cores requires deep understanding of the applications and the architecture. When incorporating large numbers of cores, performance scalability will likely saturate shared hardware resources like buses and memories. In this paper we evaluate the performance impact of various configurations of an accelerator-based multicore architecture with the aim of revealing and quantifying the bottlenecks. Then, we compare against a multicore using general-purpose processors and discuss the performance gap. Our target application is ClustalW, one of the most popular programs for Multiple Sequence Alignment. Different input data sets are characterized and we show how they influence performance. Simulation results show that due to the high computation-to-communication ratio and the transfer of data in large chunks, memory latency is well tolerated. However, bandwidth is critical to achieving maximum performance. Using a 32KB cache configuration with 4 banks can capture most of the memory traffic and therefore avoid expensive off-chip transactions. On the other hand, using a hardware queue for the tasks synchronization allows us to handle a large number of cores. Finally, we show that using a simple load balancing strategy, we can increase performance of general-purpose cores by 28%.

ieee international symposium on workload characterization | 2006

Performance Analysis of Sequence Alignment Applications

Friman Sánchez; Esther Salamí; Alex Ramirez; Mateo Valero

Advances in molecular biology have led to a continued growth in the biological information generated by the scientific community. Additionally, this area has become a multi-disciplinary field, including components of mathematics, biology, chemistry, and computer science, generating several challenges in the scientific community from different points of view. For this reason, bioinformatic applications represent an increasingly important workload. However, even though the importance of this field is clear, common bioinformatic applications and their implications on micro-architectural design have not received enough attention from the computer architecture community. This paper presents a micro-architecture performance analysis of recognized bioinformatic applications for the comparison and alignment of biological sequences, including BLAST, FASTA and some recognized parallel implementations of the Smith-Waterman algorithm that use the Altivec SIMD extension to speed-up the performance. We adopt a simulation-based methodology to perform a detailed workload characterization. We analyze architectural and micro-architectural aspects like pipeline configurations, issue widths, functional unit mixes, memory hierarchy and their implications on the performance behavior. We have found that the memory subsystem is the component with more impact in the performance of the BLAST heuristic, the branch predictor is responsible for the major performance loss for FASTA and SSEARCH34, and long dependency chains are the limiting factor in the SIMD implementations of Smith-WatermanAdvances in molecular biology have led to a continued growth in the biological information generated by the scientific community. Additionally, this area has become a multi-disciplinary field, including components of mathematics, biology, chemistry, and computer science, generating several challenges in the scientific community from different points of view. For this reason, bioinformatic applications represent an increasingly important workload. However, even though the importance of this field is clear, common bioinformatic applications and their implications on micro-architectural design have not received enough attention from the computer architecture community. This paper presents a micro-architecture performance analysis of recognized bioinformatic applications for the comparison and alignment of biological sequences, including BLAST, FASTA and some recognized parallel implementations of the Smith-Waterman algorithm that use the Altivec SIMD extension to speed-up the performance. We adopt a simulation-based methodology to perform a detailed workload characterization. We analyze architectural and micro-architectural aspects like pipeline configurations, issue widths, functional unit mixes, memory hierarchy and their implications on the performance behavior. We have found that the memory subsystem is the component with more impact in the performance of the BLAST heuristic, the branch predictor is responsible for the major performance loss for FASTA and SSEARCH34, and long dependency chains are the limiting factor in the SIMD implementations of Smith-Waterman.

complex, intelligent and software intensive systems | 2011