Hector Ortega-Arranz
University of Valladolid
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hector Ortega-Arranz.
international conference on high performance computing and simulation | 2013
Hector Ortega-Arranz; Yuri Torres; Diego R. Llanos; Arturo Gonzalez-Escribano
The Single-Source Shortest Path (SSSP) problem arises in many different fields. In this paper we present a GPU-based version of the Crauser et al. SSSP algorithm. Our work significantly speeds up the computation of the SSSP, not only with respect to the CPU-based version, but also to other state-of-the-art GPU implementation based on Dijkstra, due to Martin et al. Both GPU implementations have been evaluated using the last Nvidia architecture (Kepler). Our experimental results show that the new GPU-Crauser algorithm leads to speed-ups from 13× to 220× with respect to the CPU version and a performance gain of up to 17% with respect the GPU-Marten algorithm.
International Journal of Parallel Programming | 2015
Hector Ortega-Arranz; Yuri Torres; Arturo Gonzalez-Escribano; Diego R. Llanos
The single-source shortest path (SSSP) problem arises in many different fields. In this paper, we present a GPU SSSP algorithm implementation. Our work significantly speeds up the computation of the SSSP, not only with respect to a CPU-based version, but also to other state-of-the-art GPU implementations based on Dijkstra. Both GPU implementations have been evaluated using the latest NVIDIA architectures. The graphs chosen as input sets vary in nature, size, and fan-out degree, in order to evaluate the behavior of the algorithms for different data classes. Additionally, we have enhanced our GPU algorithm implementation using two optimization techniques: The use of a proper choice of threadblock size; and the modification of the GPU L1 cache memory state of NVIDIA devices. These optimizations lead to performance improvements of up to 23xa0% with respect to the non-optimized versions. In addition, we have made a platform comparison of several NVIDIA boards in order to distinguish which one is better for each class of graphs, depending on their features. Finally, we compare our results with an optimized sequential implementation of Dijkstra’s algorithm included in the reference Boost library, obtaining an improvement ratio of up to 19
The Journal of Supercomputing | 2014
Hector Ortega-Arranz; Yuri Torres; Arturo Gonzalez-Escribano; Diego R. Llanos
International Journal of High Performance Computing Applications | 2017
Ana Moretón; Hector Ortega-Arranz; Diego R. Llanos
times
International Journal of Parallel Programming | 2015
Hector Ortega-Arranz; Yuri Torres; Arturo Gonzalez-Escribano; Diego R. Llanos
Synthesis Lectures on Theoretical Computer Science | 2014
Hector Ortega-Arranz; Diego R. Llanos; Arturo Gonzalez-Escribano
× for some graph families, using less memory space.
ieee international conference on high performance computing data and analytics | 2014
Hector Ortega-Arranz; Yuri Torres; Diego R. Llanos; Arturo Gonzalez-Escribano; Emmanuel Jeannot; Julius Žilinskas
During the last years, GPU manycore devices have demonstrated their usefulness to accelerate computationally intensive problems. Although arriving at a parallelization of a highly parallel algorithm is an affordable task, the optimization of GPU codes is a challenging activity. The main reason for this is the number of parameters, programming choices, and tuning techniques available, many of them related with complex and sometimes hidden architecture details. A useful strategy to systematically attack these optimization problems is to characterize the different kernels of the application, and use this knowledge to select appropriate configuration parameters. The All-Pair Shortest-Path (APSP) problem is a well-known problem in graph theory whose objective is to find the shortest paths between any pairs of nodes in a graph. This problem can be solved by highly parallel and computational intensive tasks, being a good candidate to be exploited by manycore devices. In this paper, we use kernel characterization criteria to optimize an APSP algorithm implementation for NVIDIA GPUs. Our experimental results show that the combined use of proper configuration policies, and the concurrent kernels capability of new CUDA architectures, leads to a performance improvement of up to 62xa0% with respect to one of the possible configurations recommended by CUDA, considered as baseline.
Archive | 2013
Hector Ortega-Arranz; Yuri Torres; Diego R. Llanos
Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details.We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming...Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.
Archive | 2017
Javier Fresno; Hector Ortega-Arranz; Alejandro Ortega-Arranz; Arturo Gonzalez-Escribano; Diego R. Llanos
During the last decade, parallel processing architectures have become a powerful tool to deal with massively-parallel problems that require high performance computing (HPC). The last trend of HPC is the use of heterogeneous environments, that combine different computational processing devices, such as CPU-cores and graphics processing units (GPUs). Maximizing the performance of any GPU parallel implementation of an algorithm requires an in-depth knowledge about the GPU underlying architecture, becoming a tedious manual effort only suited for experienced programmers. In this paper, we present TuCCompi, a multi-layer abstract model that simplifies the programming on heterogeneous systems including hardware accelerators, by hiding the details of synchronization, deployment, and tuning. TuCCompi chooses optimal values for their configuration parameters using a kernel characterization provided by the programmer. This model is very useful to tackle problems characterized by independent, high computational-load independent tasks, such as embarrassingly-parallel problems. We have evaluated TuCCompi in different, real-world, heterogeneous environments using the all-pair shortest-path problem as a case study.
Archive | 2016
Alejandro Alonso-Mayo; Hector Ortega-Arranz; Arturo Gonzalez-Escribano
Many applications in different domains need to calculate the shortest-path between two points in a graph. In this paper we describe this shortest path problem in detail, starting with the classic Dijkstras algorithm and moving to more advanced solutions that are currently applied to road network routing, including the use of heuristics and precomputation techniques. Since several of these improvements involve subtle changes to the search space, it may be difficult to appreciate their benefits in terms of time or space requirements. To make methods more comprehensive and to facilitate their comparison, this book presents a single case study that serves as a common benchmark. The paper also compares the search spaces explored by the methods described, both from a quantitative and qualitative point of view, and including an analysis of the number of reached and settled nodes by different methods for a particular topology.