Agustin Arruabarrena
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Agustin Arruabarrena.
Journal of Physics: Condensed Matter | 2012
Xavier Andrade; Joseba Alberdi-Rodriguez; David A. Strubbe; Micael J. T. Oliveira; Fernando Nogueira; Alberto Castro; Javier Muguerza; Agustin Arruabarrena; Steven G. Louie; Alán Aspuru-Guzik; Angel Rubio; Miguel A. L. Marques
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
IEEE Transactions on Computers | 1991
Ramón Beivide; Enrique Herrada; José L. Balcázar; Agustin Arruabarrena
The authors introduce and study a family of interconnection schemes, the Midimew networks, based on circulant graphs of degree 4. A family of such circulants is determined and shown to be optimal with respect to two distance parameters simultaneously, namely maximum distance and average distance, among all circulants of degree 4.. These graphs are regular, point-symmetric, and maximally connected, and one such optimal graph exists for any given number of nodes. The proposed interconnection schemes consist of mesh-connected networks with wrap-around links, and are isomorphic to the optimal distance circulants previously considered. Ways to construct one such network for any number of nodes are shown, their good properties to build interconnection schemes for multicomputers are examined, and some interesting particular cases are discussed. The problem of routing is also addressed, and a basic algorithm is provided which is adequate for implementing the routing policy required to convey messages, traversing shortest paths between nodes. >
IEEE Parallel & Distributed Technology: Systems & Applications | 1996
José Miguel; Agustin Arruabarrena; Ramón Beivide; José A. Gregorio
This evaluation shows the effect that the recent upgrade to the IBM SP2 communication subsystem has on the execution of parallel applications, indicating that only under certain circumstances does a significant performance increase result.
Journal of Computational Chemistry | 2014
Pablo García-Risueño; Joseba Alberdi-Rodriguez; Micael J. T. Oliveira; Xavier Andrade; Michael Pippig; Javier Muguerza; Agustin Arruabarrena; Angel Rubio
We present an analysis of different methods to calculate the classical electrostatic Hartree potential created by charge distributions. Our goal is to provide the reader with an estimation on the performance—in terms of both numerical complexity and accuracy—of popular Poisson solvers, and to give an intuitive idea on the way these solvers operate. Highly parallelizable routines have been implemented in a first‐principle simulation code (Octopus) to be used in our tests, so that reliable conclusions about the capability of methods to tackle large systems in cluster computing can be obtained from our work.
Parallel Processing Letters | 1993
Agustin Arruabarrena; Ramon Beivide; Cruz Izu; J. Miguel
The performance of the communication network of a massively parallel processor depends, among other parameters, on the network topology, the message flow control and the routing mechanisms. This paper analyses the gains in average message latency and maximum sustained throughput that can be achieved using an adaptive routing strategy instead of an oblivious one. Two different bidimensional topologies have been studied, mesh and torus, using cut-through message flow control. First, we have simulated an ideal case in which there is no limit to the temporary storage capacity of the routing node. Then, a more realistic design, that implies the implementation of a deadlock avoidance technique, is analysed. To assure deadlock-free routing, the network is split into several virtual networks. Results show that adaptive routing is not a good election with this kind of networks. The torus topology shows potentially better results than the mesh. In any case, a different deadlock avoidance technique should be implemented if these potential gains are to be exploited.
euromicro workshop on parallel and distributed processing | 1995
J. Miguel; Agustin Arruabarrena; Cruz Izu; Ramon Beivide
An implementation of a conservative parallel simulator with deadlock avoidance is presented. Its performance when working with a realistic model of a message routing network is evaluated and contrasted against a sequential simulator. Different factors that improve the performance of the parallel simulation are discussed, focusing in the model under study and the available computer: a network of transputers. These factors are the load of the model being simulated, the grain size of the simulator and the simulator ability to exploit the lookahead property of the model.<<ETX>>
Journal of Systems Architecture | 1998
José M. Alonso; Agustin Arruabarrena; Ramón Beivide; José A. B. Fortes
A model of a message passing network is used to analyze the behavior of three implementations of the Chandy-Misra-Bryant (CMB) parallel simulation algorithm. The characteristics of the model, the organization of the logical processes (LPs) that constitute the simulator and the characteristics of the host parallel computer have a definite influence on the achieved performance, measured in terms of speedup. Large, loaded models help CMB to synchronize with a minimum overhead, efficiently exploiting the available parallelism. Mapping several LPs onto each processor achieves a better use of the available processing power, because while an LP is blocked (synchronizing) others can use the CPU. However, it is not convenient to map too many LPs onto each processor because the synchronization cost would be too high. The communication demands of CMB reduce its efficiency in environments where the cost of passing messages is too high: the performance of CMB running on a network of workstations is quite poor; in contrast, good speedups can be achieved using commercial multicomputers.
international conference on computational science and its applications | 2014
Joseba Alberdi-Rodriguez; Micael J. T. Oliveira; Pablo García-Risueño; Fernando Nogueira; Javier Muguerza; Agustin Arruabarrena; Angel Rubio
In this work we present the improvements made to the Octopus code in order to reduce the memory requirements and to optimise parallel data distribution. Both topics are central for efficiency and feasibility of calculations when the system must be run in a large HPC environment. These modifications were mainly made in the real-space mesh partitioning and mapping algorithms, and are thus transferable to other codes using this type of real-space representation of data. The code became much more efficient, and we present several scalability results showing that it is now possible to address ab-initio quantum-mechanical simulations of the interaction of light with big biomolecules, paving the way for a better understanding of phenomena such as energy conversion in plants.
Microprocessors and Microsystems | 1992
Cruz Izu; Agustin Arruabarrena; Ramón Beivide
Abstract This work is focused on the analysis of the T800 transputer as the message manager of a multicomputer system. The studied topologies are a 4 × 4 mesh and a 4 × 4 torus. A model for the message manager has been developed and implemented in Occam. A random communication pattern and store-and-forward flow control have been used. Using this model, message latency and throughput have been studied for different traffic levels, ranging from null traffic up to the saturation point of the network. The influence of the message length on the network behaviour has also been considered. Finally, the theoretical upper bound of the throughput is discussed and compared with the experimental data.
international conference on supercomputing | 1998
Cruz Izu; Agustin Arruabarrena
Communications in a multicomputer system consist of heterogeneous traffic in which messages exhibit a variety of sizes. Network response is highly dependent on message length distribution, as reported in most network evaluation studies. Hence, router design should be optimized for dealing with heterogeneous traffic. This study analyzes the interaction amongst short and long messages in networks with bimodal length traffic: both single packet traffic (20 flits) and multipacket messages (200 flits). Router design determines the access of both traffic classes to the network resources so we have considered multiple alternatives, from the generic CT or WH static router to variations of the segment router proposed in [9], which maps the two traffic classes into two separate virtual networks. Having independent injection queues for each virtual network and adjusting the channel multiplexing policy to favour short or long messages provides good performance and added flexibility when compared to its cut-through counterpart.