Valentin Puente | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Valentin Puente is active.

Explore More

Publication

Featured researches published by Valentin Puente.

Journal of Parallel and Distributed Computing | 2001

The Adaptive Bubble Router

Valentin Puente; Cruz Izu; Ramón Beivide; José-Ángel Gregorio; Fernando Vallejo; J. M. Prellezo

The design of a new adaptive virtual cut-through router for torus networks is presented in this paper. With much lower VLSI costs than adaptive wormhole routers, the adaptive Bubble router is even faster than deterministic wormhole routers based on virtual channels. This has been achieved by combining a low-cost deadlock avoidance mechanism for virtual cut-through networks, called Bubble flow control, with an adequate design of the routers arbiter. A thorough methodology has been employed to quantify the impact that this router design has at all levels, from its hardware cost to the system performance when running parallel applications. At the VLSI level, our proposal is the adaptive router with the shortest clock cycle and node delay when compared with other state-of-the-art alternatives. This translates into the lowest latency and highest throughput under standard synthetic loads. At system level, these gains reduce the execution time of the benchmarks considered. Compared with current adaptive wormhole routers, the execution time is reduced by up to 27%. Furthermore, this is the only router that improves system performance when compared with simpler static designs.

international conference on parallel processing | 1999

Adaptive bubble router: a design to improve performance in torus networks

Valentin Puente; Ramón Beivide; José A. Gregorio; J. M. Prellezo; José Duato; Cruz Izu

A router design for torus networks that significantly reduces message latency over traditional wormhole routers is presented in this paper. This new router implements virtual cut-through switching and fully-adaptive minimal routing. Packet deadlock is avoided by providing escape ways governed by Bubble flow control, a mechanism that guarantees enough free buffer space in the network to allow continuous packet movement. Both deterministic and adaptive Bubble routers have been designed in VLSI using VHDL synthesis tools. Adopting a fair quantitative comparison, we demonstrate that Bubble routers exhibit a reduction in base latency values over 40% with respect to the corresponding wormhole routers, without any penalty in network throughput. With much lower VLSI costs than adaptive wormhole routers, the adaptive Bubble router is even faster than deterministic wormhole routers based on virtual channels.

international symposium on computer architecture | 2004

Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism

Valentin Puente; José A. Gregorio; Fernando Vallejo; Ramón Beivide

A new and efficient mechanism to tolerate failures in interconnection networks for parallel and distributed computers, denoted as Immunet, is presented in this work. In the presence of failures, Immunet automatically reacts with a hardware reconfiguration of the surviving network resources. Immunet has four important advantages over previous fault-tolerant switching mechanisms. Its low hardware costs minimize the overhead that the network must support in absence of faults. As long as the network remains connected, Immunet can tolerate any number of failures regardless of their spatial and temporal combinations. The resulting communication infrastructure provides optimized adaptive minimal routing over the surviving topology. The system behavior under successive failures exhibits graceful performance degradation. Immunet reconfiguration can be totally transparent to the applications running on the parallel system as they will only be affected by the loss of those data packets circulating through the broken components. The rest of the packets will suffer only a tolerable delay induced by the time employed to perform the automatic network reconfiguration. Descriptions of the hardware network architecture and detailed synthetic and execution-driven simulations will demonstrate the benefits of Immunet.

parallel distributed and network based processing | 2002

SICOSYS: an integrated framework for studying interconnection network performance in multiprocessor systems

Valentin Puente; José A. Gregorio; Ramón Beivide

An environment has been developed which is capable of determining the impact that a multiprocessor interconnection subsystem causes on real application execution time. A general-purpose interconnection network simulator, called SICOSYS, able to capture essential aspects of the low-level implementation, has been integrated into two execution driven simulators for multiprocessors: RSIM and SimOS. The enhancement of both tools allows the analysis of new proposals for the interconnection subsystem of a cc-NUMA machine, from the VLSI level up to the real application level. Any new proposal can be translated to a specific message router architecture and by using a low-level implementation tool, the parameter delays of a detailed router model to be used by SICOSYS can be obtained.

international symposium on computer architecture | 2007

Rotary router: an efficient architecture for CMP interconnection networks

Pablo Abad; Valentin Puente; José A. Gregorio; Pablo Prieto

The trend towards increasing the number of processor cores and cache capacity in future Chip-Multiprocessors (CMPs), will require scalable packet-switched interconnection networks adapted to the restrictions imposed by the CMP environment. This paper presents an innovative router design, which successfully addresses CMP cost/performance constraints. The router structure is based on two independent rings, which force packets to circulate either clockwise or anti-clockwise, traveling through every port of the router. It uses a completely decentralized scheduling scheme, which allows the design to: (1) take advantage of wide links, (2) reduce Head of Line blocking, (3) use adaptive routing, (4) be topology agnostic, (5) scale with network degree, and (6) have reasonable power consumption and implementation cost. A thorough comparative performance analysis against competitive conventional routers shows an advantage for our proposal of up to 50 % in terms of raw performance and nearly 60 % in terms of energy-delay product.

networks on chips | 2012

TOPAZ: An Open-Source Interconnection Network Simulator for Chip Multiprocessors and Supercomputers

Pablo Abad; Pablo Prieto; Lucia G. Menezo; AdriÂ´n Colaso; Valentin Puente; José-Ángel Gregorio

As in other computer architecture areas, interconnection networks research relies most of the times on simulation tools. This paper announces the release of an open-source tool suitable to be used for accurate modeling from small CMP to large supercomputer interconnection networks. The cycle-accurate modeling of TOPAZ can be used standalone through synthetic traffic patterns and application-traces or within full-system evaluation systems such as GEMS or GEM5 effortlessly. In fact, we provide an advanced interface that enables the replacement of the original lightweight but optimistic GEMS and GEM5 network simulator with limited performance impact on the simulation time. Our tests indicate that in this context, underestimating network modeling could induce up to 50% error in the performance estimation of the simulated system. To minimize the impact of detailed network modeling on simulation time, we incorporate mechanisms able to attenuate the higher computational effort, reducing in this way the slowdown of the full system simulation with accurate performance estimations. Additionally, in order to evaluate large-scale networks, we parallelize the simulator to be able to optimize memory resources with the growing number of cores available per chip in the simulation farms. This allows us to simulate node networks exceeding one million of routers with up to 70% efficiency in a multithreaded simulation running on twelve cores.

high-performance computer architecture | 2009

MRR: Enabling fully adaptive multicast routing for CMP interconnection networks

Pablo Abad; Valentin Puente; José-Ángel Gregorio

On-network hardware support for multi-destination traffic is a desirable feature in most multiprocessor machines. Multicast hardware capabilities enable much more effective bandwidth utilization as multi-destination packets do not need to repeatedly use the same resources, as occurs when multicast traffic must be decomposed in unicast packets. Although Chip Multiprocessors are not an exception in this interest, up to date, few fitting proposals exist. The combination of the scarcity of available resources and the common idea that multicast support requires a substantial amount of extra resources is responsible for this situation. In this work, we propose a new approach suitable for on-chip networks capable of managing multi-destination traffic via hardware in an efficient way with negligible complexity. We introduce the Multicast Rotary Router (MRR), a router able to: (1) perform on-network multicast support with almost zero cost over the Rotary Router, (2) use a fully adaptive tree to distribute multicast traffic, (3) perform on-network congestion control extending network utilization range. The performance results, using a state-of-the-art full system simulation framework, show that it improves average full system performance of a CMP using a unicast Rotary Router in its interconnection network by 25%, and an input buffered router with multicast support by 20%.

high-performance computer architecture | 2010

ESP-NUCA: A low-cost adaptive Non-Uniform Cache Architecture

Javier Merino; Valentin Puente; José-Ángel Gregorio

This paper introduces a cost effective cache architecture called Enhanced Shared-Private Non-Uniform Cache Architecture (ESP-NUCA), which is suitable for highperformance Chip MultiProcessors (CMPs). This architecture enhances system stability by combining the advantages of private and shared caches. Starting from a shared NUCA, ESP-NUCA introduces a low-cost mechanism to dynamically allocate private cache blocks closer to their owner processor. In this way, average on-chip access latency is reduced and inter-core interference minimized. ESP-NUCA synergistically integrates victims and replicas thus making it possible to take advantage of multiple-readers for shared data, and to maximize cache usage under unbalanced core utilization. This architecture leads to stable behavior within the whole system across a broad spectrum of working scenarios. ESP-NUCA not only outperforms architectures with similar implementation costs such as private and shared caches by up to 20% and 40% respectively, but even outperforms much costlier architectures such as D-NUCA [13] by up to 28%, Adaptive Selective Replication [3] by up to 19%, and Cooperative Caching [5] by up to 15%. Moreover, performance variance throughout the set of benchmarks is 37% lower than with ASR, 87% lower than with D-NUCA, and 43% lower than with Cooperative Caching.

IEEE Transactions on Parallel and Distributed Systems | 2007

Immucube: Scalable Fault-Tolerant Routing for k-ary n-cube Networks

Valentin Puente; José A. Gregorio

This work presents Immucube, a scalable and efficient mechanism to improve dependability of interconnection networks for parallel and distributed computers. Immucube achieves better flexibility and scalability than any other previous fault-tolerant mechanism in k-ary n-cubes. The proposal inherits from Immunet several advantages over other previous fault-tolerant routing algorithms: 1) allowing any temporal and spatial fault combination, 2) permitting automatic and application-transparent reconfiguration after any fault, and 3) requiring a negligible overhead in the absence of faults. Immucube introduces new important features, such as: 4) providing graceful performance degradation, even in very large interconnection networks, 5) tolerating transparent resource utilization after transitory faults or partial repair of faulty resources, 6) being able to deal with intermittent faults, and 7) being able to dynamically recover the original network performance when all the failed components have been repaired

international parallel and distributed processing symposium | 2003

A low cost fault tolerant packet routing for parallel computers

Valentin Puente; José A. Gregorio; Ramón Beivide; Fernando Vallejo

This paper presents a new switching mechanism to tolerate arbitrary faults in interconnection networks with a negligible implementation cost. Although our routing technique can be applied to any regular or irregular topology, in this paper we focus on its application to k-ary n-cube networks when managing both synthetic and real traffic workloads. Our mechanism is effective regardless the number of faults and their configuration. When the network is working without any fault, no overhead is added to the original routing scheme. In the presence of a low number of faults, the network sustains a performance close to that observed under fault-free conditions. Finally, when the number of faults increases, the system exhibits a graceful performance degradation.

Explore More