José A. Gregorio | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José A. Gregorio is active.

Explore More

Publication

Featured researches published by José A. Gregorio.

international conference on parallel processing | 1999

Adaptive bubble router: a design to improve performance in torus networks

Valentin Puente; Ramón Beivide; José A. Gregorio; J. M. Prellezo; José Duato; Cruz Izu

A router design for torus networks that significantly reduces message latency over traditional wormhole routers is presented in this paper. This new router implements virtual cut-through switching and fully-adaptive minimal routing. Packet deadlock is avoided by providing escape ways governed by Bubble flow control, a mechanism that guarantees enough free buffer space in the network to allow continuous packet movement. Both deterministic and adaptive Bubble routers have been designed in VLSI using VHDL synthesis tools. Adopting a fair quantitative comparison, we demonstrate that Bubble routers exhibit a reduction in base latency values over 40% with respect to the corresponding wormhole routers, without any penalty in network throughput. With much lower VLSI costs than adaptive wormhole routers, the adaptive Bubble router is even faster than deterministic wormhole routers based on virtual channels.

international symposium on computer architecture | 2004

Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism

Valentin Puente; José A. Gregorio; Fernando Vallejo; Ramón Beivide

A new and efficient mechanism to tolerate failures in interconnection networks for parallel and distributed computers, denoted as Immunet, is presented in this work. In the presence of failures, Immunet automatically reacts with a hardware reconfiguration of the surviving network resources. Immunet has four important advantages over previous fault-tolerant switching mechanisms. Its low hardware costs minimize the overhead that the network must support in absence of faults. As long as the network remains connected, Immunet can tolerate any number of failures regardless of their spatial and temporal combinations. The resulting communication infrastructure provides optimized adaptive minimal routing over the surviving topology. The system behavior under successive failures exhibits graceful performance degradation. Immunet reconfiguration can be totally transparent to the applications running on the parallel system as they will only be affected by the loss of those data packets circulating through the broken components. The rest of the packets will suffer only a tolerable delay induced by the time employed to perform the automatic network reconfiguration. Descriptions of the hardware network architecture and detailed synthetic and execution-driven simulations will demonstrate the benefits of Immunet.

parallel distributed and network based processing | 2002

SICOSYS: an integrated framework for studying interconnection network performance in multiprocessor systems

Valentin Puente; José A. Gregorio; Ramón Beivide

An environment has been developed which is capable of determining the impact that a multiprocessor interconnection subsystem causes on real application execution time. A general-purpose interconnection network simulator, called SICOSYS, able to capture essential aspects of the low-level implementation, has been integrated into two execution driven simulators for multiprocessors: RSIM and SimOS. The enhancement of both tools allows the analysis of new proposals for the interconnection subsystem of a cc-NUMA machine, from the VLSI level up to the real application level. Any new proposal can be translated to a specific message router architecture and by using a low-level implementation tool, the parameter delays of a detailed router model to be used by SICOSYS can be obtained.

international symposium on computer architecture | 2007

Rotary router: an efficient architecture for CMP interconnection networks

Pablo Abad; Valentin Puente; José A. Gregorio; Pablo Prieto

The trend towards increasing the number of processor cores and cache capacity in future Chip-Multiprocessors (CMPs), will require scalable packet-switched interconnection networks adapted to the restrictions imposed by the CMP environment. This paper presents an innovative router design, which successfully addresses CMP cost/performance constraints. The router structure is based on two independent rings, which force packets to circulate either clockwise or anti-clockwise, traveling through every port of the router. It uses a completely decentralized scheduling scheme, which allows the design to: (1) take advantage of wide links, (2) reduce Head of Line blocking, (3) use adaptive routing, (4) be topology agnostic, (5) scale with network degree, and (6) have reasonable power consumption and implementation cost. A thorough comparative performance analysis against competitive conventional routers shows an advantage for our proposal of up to 50 % in terms of raw performance and nearly 60 % in terms of energy-delay product.

IEEE Parallel & Distributed Technology: Systems & Applications | 1996

Assessing the performance of the new IBM SP2 communication subsystem

José Miguel; Agustin Arruabarrena; Ramón Beivide; José A. Gregorio

This evaluation shows the effect that the recent upgrade to the IBM SP2 communication subsystem has on the execution of parallel applications, indicating that only under certain circumstances does a significant performance increase result.

IEEE Transactions on Parallel and Distributed Systems | 2007

Immucube: Scalable Fault-Tolerant Routing for k-ary n-cube Networks

Valentin Puente; José A. Gregorio

This work presents Immucube, a scalable and efficient mechanism to improve dependability of interconnection networks for parallel and distributed computers. Immucube achieves better flexibility and scalability than any other previous fault-tolerant mechanism in k-ary n-cubes. The proposal inherits from Immunet several advantages over other previous fault-tolerant routing algorithms: 1) allowing any temporal and spatial fault combination, 2) permitting automatic and application-transparent reconfiguration after any fault, and 3) requiring a negligible overhead in the absence of faults. Immucube introduces new important features, such as: 4) providing graceful performance degradation, even in very large interconnection networks, 5) tolerating transparent resource utilization after transitory faults or partial repair of faulty resources, 6) being able to deal with intermittent faults, and 7) being able to dynamically recover the original network performance when all the failed components have been repaired

international parallel and distributed processing symposium | 2003

A low cost fault tolerant packet routing for parallel computers

Valentin Puente; José A. Gregorio; Ramón Beivide; Fernando Vallejo

This paper presents a new switching mechanism to tolerate arbitrary faults in interconnection networks with a negligible implementation cost. Although our routing technique can be applied to any regular or irregular topology, in this paper we focus on its application to k-ary n-cube networks when managing both synthetic and real traffic workloads. Our mechanism is effective regardless the number of faults and their configuration. When the network is working without any fault, no overhead is added to the original routing scheme. In the presence of a low number of faults, the network sustains a performance close to that observed under fault-free conditions. Finally, when the number of faults increases, the system exhibits a graceful performance degradation.

ACM Sigarch Computer Architecture News | 2008

SP-NUCA: a cost effective dynamic non-uniform cache architecture

Javier Merino; Valentin Puente; Pablo Prieto; José A. Gregorio

This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces a feasible way to allocate cache blocks according to the access pattern. Each L2 bank is dynamically partitioned at set level in private and shared content. Simply by adjusting the replacement algorithm, we can place private data closer to its owner processor. In contrast, independently of the accessing processor, shared data is always placed in the same position. This approach is capable of reducing on-chip latency without significantly sacrificing hit rates or increasing implementation cost of a conventional static NUCA. Additionally, most of the unnecessary interference between cores in private accesses is removed. To support the architectural decisions adopted and provide a comparative study, a comprehensive evaluation framework is employed. The workbench is composed of a full system simulator, and a representative set of multithreaded and multiprogrammed workloads. With this infrastructure, different alternatives for the coherence protocol, replacement policies, and cache utilization are analyzed to find the optimal proposal. We conclude that the cost for a feasible implementation should be closer to a conventional static NUCA, and significantly less than a dynamic NUCA. Finally, a comparison with static and dynamic NUCA is presented. The simulation results suggest that on average the mechanism proposed could improve system performance of a static NUCA and idealized dynamic NUCA by 16% and 6% respectively.

IEEE Transactions on Computers | 2008

Immunet: Dependable Routing for Interconnection Networks with Arbitrary Topology

Valentin Puente; José A. Gregorio; Fernando Vallejo; Ramoón Beivide

A complete mechanism for tolerating multiple failures in parallel computer systems, denoted as Immunet, is described in this paper. Immunet can be applied to arbitrary topologies, either regular or irregular, exhibiting in both cases graceful performance degradation. Provided that the network remains connected, Immunet is able to deal with any number of failures regardless of their spatial and temporal distribution. Our mechanism operates on the basis of a dynamic network reconfiguration in response to failures. The network reconfiguration only employs local information recorded at the router nodes which leads to a highly scalable system. In addition, its low cost and overhead permit a practicable hardware implementation. Finaly, Immunet could allow circumvent failures transparently to applications running on a parallel system because it does not require dropping in-flight traffic. Only packets stored in or traveling through a broken component should be recovered by higher system levels.

international conference on supercomputing | 2000

Improving parallel system performance by changing the arrangement of the network links

Valentin Puente; Cruz Izu; José A. Gregorio; Ramón Beivide; J. M. Prellezo; Fernando Vallejo

The Midimew network is an excellent contender for implementing the communication subsystem of a high performance computer. This network is an optimal 2D topology in the sense there are no other symmetric direct networks of degree 4 with a lower average distance or diameter. In fact, it reduces the diameter of the well known torus network by approximately □2. Although the topology was proposed and analyzed a decade ago, the lack of simple deadlock avoidance mechanisms prevented its utilization up to date. This study solved this drawback by applying the Bubble switching mechanism, a low cost deadlock-avoidance strategy developed by the authors. Moreover, by using routing tables we can configure our Virtual Cut-Through adaptive router to implement either a torus or a Midimew network. Thus, we can exploit the topological advantages of Midimew networks by simply changing the disposition of the wrap-around connections of its torus counterpart, without increasing the network implementation cost. To prove this assertion, we have carried out a thorough evaluation, from the hardware cost of the router to the parallel system performance under real loads.

Explore More