Javier Navaridas
University of Manchester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Javier Navaridas.
international conference on supercomputing | 2009
Javier Navaridas; Mikel Luján; José Miguel-Alonso; Luis A. Plana; Steve B. Furber
SpiNNaker is a massively parallel architecture designed to model large-scale spiking neural networks in (biological) real-time. Its design is based around ad-hoc multi-core System-on-Chips which are interconnected using a two-dimensional toroidal triangular mesh. Neurons are modeled in software and their spikes generate packets that propagate through the on- and inter-chip communication fabric relying on custom-made on-chip multicast routers. This paper models and evaluates large-scale instances of its novel interconnect (more than 65 thousand nodes, or over one million computing cores), focusing on real-time features and fault-tolerance. The key contribution can be summarized as understanding the properties of the feasible topologies and establishing the stable operation of the SpiNNaker under different levels of degradation. First we derive analytically the topological characteristics of the network, which are later confirmed by experimental work. With the computational model developed, we investigate the topology of SpiNNaker, and compare it with a standard 3-dimensional torus. The novel emergency routing mechanism, implemented within the routers, allows the topology of SpiNNaker to be more robust than the 3-dimensional torus, regardless of the latter having better topological characteristics. Furthermore, we obtain optimal values of two router parameters related with livelock and deadlock avoidance mechanisms.
IEEE Transactions on Parallel and Distributed Systems | 2010
José María Cámara; Miquel Moreto; Enrique Vallejo; Ramón Beivide; José Miguel-Alonso; Carmen Martínez; Javier Navaridas
Many current parallel computers are built around a torus interconnection network. Machines from Cray, HP, and IBM, among others, make use of this topology. In terms of topological advantages, square (2D) or cubic (3D) tori would be the topologies of choice. However, for different practical reasons, 2D and 3D tori with different number of nodes per dimension have been used. These mixed-radix topologies are not edge symmetric, which translates into poor performance due to an unbalanced use of network resources. In this work, we analyze twisted 2D and 3D mixed-radix tori that remove the network bottlenecks present in nontwisted ones. Such topologies recover edge symmetry, and consequently, balance the utilization of their links. The distance-related properties of twisted tori together with a full characterization of their bisection bandwidth are described in this paper. A simulation-based performance evaluation has been carried out to assess the network performance under synthetic and trace-driven workloads. The obtained results show noticeable and consistent performance gains (up to an increase of 74 percent in accepted load). In addition, we propose scalable and practicable packet routing mechanisms and wiring layouts for these interconnection systems. The complexity of the architectural proposals is similar to the one exhibited by routing and folding mechanisms in standard tori.
Simulation Modelling Practice and Theory | 2011
Javier Navaridas; José Miguel-Alonso; Jose Antonio Pascual; Francisco Javier Ridruejo
This paper describes INSEE, a simulation framework developed at the University of the Basque Country. INSEE is designed to carry out performance-related studies of interconnection networks. It is composed of two main modules: a Functional Simulator of Interconnection Networks (FSIN) and a TRaffic GENeration module (TrGen), together with several other modules that extend INSEE’s functionality. The router models and topologies implemented in FSIN are included in this description. Likewise, the available methods to generate traffic are described thoroughly. Finally, the resource consumption of INSEE when managing different systems and workloads is also evaluated. INSEE has been used to conduct a variety of studies, including evaluation of novel topologies, diagnosis of causes of network congestion and evaluation of parallel scheduling algorithms.
field programmable logic and applications | 2014
Oriol Arcas-Abella; Geoffrey Ndu; Nehir Sonmez; Mohsen Ghasempour; Adrià Armejach; Javier Navaridas; Wei Song; John Mawer; Adrián Cristal; Mikel Luján
High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.
job scheduling strategies for parallel processing | 2009
Jose Antonio Pascual; Javier Navaridas; José Miguel-Alonso
This paper studies the influence that job placement may have on scheduling performance, in the context of massively parallel computing systems. A simulation-based performance study is carried out, using workloads extracted from real systems logs. The starting point is a parallel system built around a k-ary n-tree network and using well-known scheduling algorithms (FCFS and backfilling). We incorporate an allocation policy that tries to assign to each job a contiguous network partition, in order to improve communication performance. This policy results in severe scheduling inefficiency due to increased system fragmentation. A relaxed version of it, which we call quasi-contiguous allocation, reduces this adverse effect. Experiments show that, in those cases where the exploitation of communication locality results in an effective reduction of application execution time, the achieved gains more than compensate the scheduling inefficiency, therefore resulting in better overall performance.
International Journal of Parallel Programming | 2009
José Miguel-Alonso; Javier Navaridas; Francisco Javier Ridruejo
This paper addresses the utilization of traces taken from MPI applications to do simulation-based performance studies of parallel computing systems. Different mechanisms to capture traces are discussed, pointing out important limitations of some of them. One of these limitations is the invisibility of message interchanges in collective operations, which is circumvented modifying a trace-capturing library. During a simulation, trace records must be simulated in causal order, to fully comply with application semantics. Alternatives to follow this order, and the risks of not following it, are presented and discussed. The techniques introduced in this paper have been implemented in an in-house developed simulation environment, which is used in two example studies to show its usefulness: an evaluation of alternatives for interconnection network design, and a performance prediction study in which traces from one machine are used to estimate the execution times of applications running in a different machine.
parallel computing | 2010
Javier Navaridas; José Miguel-Alonso; Francisco Javier Ridruejo; Wolfgang E. Denzel
Interconnection networks based on the k-ary n-tree topology are widely used in high-performance parallel computers. However, this topology is expensive and complex to build. In this paper we evaluate an alternative tree-like topology that is cheaper in terms of cost and complexity because it uses fewer switches and links. This alternative topology leaves unused upward ports on switches, which can be rearranged to be used as downward ports. The increase of locality might be efficiently exploited by applications. We test the performance of these thin-trees, and compare it with that of regular trees. Evaluation is carried out using a collection of synthetic traffic patterns that emulate the behavior of scientific applications and functions within message passing libraries, not only in terms of sources and destinations of messages, but also considering the causal relationships among them. We also propose a methodology to perform cost and performance analysis of different networks. Our main conclusion is that, for the set of studied workloads, the performance drop in thin-trees is less noticeable than the cost savings.
parallel, distributed and network-based processing | 2009
Javier Navaridas; Jose Antonio Pascual; José Miguel-Alonso
This paper studies the influence that task placement may have on the performance of applications, mainly due to the relationship between communication locality and overhead. This impact is studied for torus and fat-tree topologies. A simulation-based performance study is carried out, using traces of applications and application kernels, to measure the time taken to complete one or several concurrent instances of a given workload. As the purpose of the paper is not to offer a miraculous task placement strategy, but to measure the impact that placement have on performance, we selected simple strategies, including random placement. The quantitative results of these experiments show that different workloads present different degrees of responsiveness to placement. Furthermore, both the number of concurrent parallel jobs sharing a machine and the size of its network has a clear impact on the time to complete a given workload. We conclude that the efficient exploitation of a parallel computer requires the utilization of scheduling policies aware of application behavior and network topology.
Journal of Parallel and Distributed Computing | 2012
Cameron Patterson; Jim D. Garside; Eustace Painkras; Steve Temple; Luis A. Plana; Javier Navaridas; Thomas Sharp; Steve B. Furber
The design of a new high-performance computing platform to model biological neural networks requires scalable, layered communications in both hardware and software. SpiNNakers hardware is based upon Multi-Processor System-on-Chips (MPSoCs) with flexible, power-efficient, custom communication between processors and chips. The architecture scales from a single 18-processor chip to over 1 million processors and to simulations of billion-neuron, trillion-synapse models, with tens of trillions of neural spike-event packets conveyed each second. The communication networks and overlying protocols are key to the successful operation of the SpiNNaker architecture, designed together to maximise performance and minimise the power demands of the platform. SpiNNaker is a work in progress, having recently reached a major milestone with the delivery of the first MPSoCs. This paper presents the architectural justification, which is now supported by preliminary measured results of silicon performance, indicating that it is indeed scalable to a million-plus processor system.
international parallel and distributed processing symposium | 2008
Javier Navaridas; José Miguel-Alonso; Francisco Javier Ridruejo
Evaluation of high performance parallel systems is a delicate issue, due to the difficulty of generating workloads that represent, with fidelity, those that will run on actual systems. In this paper we make an overview of the most usual methodologies used to generate workloads for performance evaluation purposes, focusing on the network: random traffic, patterns based on permutations, traces, execution-driven, etc. In order to fill the gap between purely synthetic and application- driven workloads, we present a set of pseudo-synthetic workloads that mimic applications behavior, emulating some widely-used implementations of MPI collectives and some communication patterns commonly used in scientific applications. This mimicry is done in terms of spatial distribution of messages as well as in the causal relationship among them. As an example of the proposed methodology, we use a subset of these workloads to confront tori and fat-trees.