Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Antoni Roca is active.

Publication


Featured researches published by Antoni Roca.


networks on chips | 2010

Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing

Samuel Rodrigo; Jose Flich; Antoni Roca; Simone Medardoni; Davide Bertozzi; Jesus Camacho; Federico Silla; José Duato

The high-performance computing domain is enriching with the inclusion of Networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge.In this paper, uLBDR (Universal Logic-Based Distributed Routing) is proposed as an efficient logic-based mechanism that adapts to any irregular topology derived from 2D meshes, being an alternative to the use of routing tables (either at routers or at end-nodes). uLBDR requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories. Several implementations of uLBDR are presented highlighting the trade-off between routing cost and coverage. The alternatives span from the previously proposed LBDR approach (with 30\% of coverage) to the uLBDR mechanism achieving full coverage. This comes with a small performance cost, thus exhibiting the trade-off between fault tolerance and performance.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011

Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems

Samuel Rodrigo; Jose Flich; Antoni Roca; Simone Medardoni; Davide Bertozzi; Jesus Camacho; Federico Silla; José Duato

The high-performance computing domain is enriching with the inclusion of networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area, and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism, or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge. This paper presents universal logic-based distributed routing (uLBDR), an efficient logic-based mechanism that adapts to any irregular topology derived from 2-D meshes, instead of using routing tables. uLBDR requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories. Several implementations of uLBDR are presented highlighting the tradeoff between routing cost and coverage. The alternatives span from the previously proposed LBDR approach (with 30% of coverage) to the uLBDR mechanism achieving full coverage. This comes with a small performance cost, thus exhibiting the tradeoff between fault tolerance and performance. Power consumption, area, and delay estimates are also provided highlighting the efficiency of the mechanism. To do this, different router models (one for CMPs and one for MPSoCs) have been designed as a proof concept.


networks on chips | 2010

Improving the Performance of GALS-Based NoCs in the Presence of Process Variation

Carles Hernandez; Antoni Roca; Federico Silla; Jose Flich; José Duato

Current integration scales allow designing chip multiprocessors (CMP) where cores are interconnected by means of a network-on-chip (NoC). Unfortunately, the small feature size of current integration scales cause some unpredictability in manufactured devices because of process variation. In NoCs,variability may affect links and routers causing that they do not match the parameters established at design time. In this paper we first analyze the way that manufacturing deviations affect the components of a NoC by applying a comprehensive and detailed variability model to 200 instances of an 8x8 mesh NoC synthesized using 45nm technology. A second contribution of this paper is showing that GALS-based NoCs present communication bottlenecks under process variation. To overcome this performance reduction we draft a novel approach, called performance domains, intended to reduce the negative impact of variability on application execution time. This mechanism is suitable when several applications are simultaneously running in the CMP chip.


field programmable logic and applications | 2012

DESA: Distributed Elastic Switch Architecture for efficient networks-on-FPGAS

Antoni Roca; Jose Flich; Giorgos Dimitrakopoulos

Networks-on-FPGA consist of a network of switches connected with point-to-point links and can cover sufficiently the communication needs of complex systems implemented on FPGA platforms. The efficient implementation of such networks requires the appropriate tuning of their components to the characteristics of the FPGAs logic and memory resources. In this paper, we present a distributed switch architecture that exploits in the best way the structure of the FPGA and achieves significant area/delay savings when compared to baseline switch architectures; more than 50% increase in operating frequency is achieved for similar area. The proposed switch operates as an elastic pipeline and can be spread throughout the FPGA chip irrespective the topology of the network and without limiting the placement options of the corresponding EDA tools.


Microprocessors and Microsystems | 2011

A low-latency modular switch for CMP systems

Antoni Roca; Jose Flich; Federico Silla; José Duato

As technology advances, the number of cores in Chip MultiProcessor systems and MultiProcessor Systems-on-Chips keeps increasing. The network must provide sustained throughput and ultra-low latencies. In this paper we propose new pipelined switch designs focused in reducing the switch latency. We identify the switch components that limit the switch frequency: the arbiter. Then, we simplify the arbiter logic by using multiple smaller arbiters, but increasing greatly the switch area. To solve this problem, a second design is presented where the routing traversal and arbitrations tasks are mixed. Results demonstrate a switch latency reduction ranging from 10% to 21%. Network latency is reduced in a range from 11% to 15%.


Signal, Image and Video Processing | 2008

Reduced decoder complexity and latency in pixel-domain Wyner-Ziv video coders

Marleen Morbée; Antoni Roca; Josep Prades-Nebot; Aleksandra Pižurica; Wilfried Philips

In some video coding applications, it is desirable to reduce the complexity of the video encoder at the expense of a more complex decoder. Wyner–Ziv (WZ) video coding is a new paradigm that aims to achieve this. To allocate a proper number of bits to each frame, most WZ video coding algorithms use a feedback channel, which allows the decoder to request additional bits when needed. However, due to these multiple bit requests, the complexity and the latency of WZ video decoders increase massively. To overcome these problems, in this paper we propose a rate allocation (RA) algorithm for pixel-domain WZ video coders. This algorithm estimates at the encoder the number of bits needed for the decoding of every frame while still keeping the encoder complexity low. Experimental results show that, by using our RA algorithm, the number of bit requests over the feedback channel—and hence, the decoder complexity and the latency—are significantly reduced. Meanwhile, a very near-to-optimal rate-distortion performance is maintained.


Journal of Systems Architecture | 2013

Silicon-aware distributed switch architecture for on-chip networks

Antoni Roca; Carles Hernandez; Jose Flich; Federico Silla; José Duato

It is well-known that current Chip MultiProcessor (CMP) and high-end MultiProcessor System-on-Chip (MPSoC) designs are growing in their number of components. Networks-on-Chip (NoC) provide the required connectivity for such CMP and MPSoC designs at reasonable costs. As technology advances, links become the critical component in the NoC due to their long delay and power consumption, becoming unacceptable for long global interconnects. In this paper we present a new switch architecture that reduces the negative impact of links on the NoC. We call our proposal distributed switch. The distributed switch spreads the circuitry of the switch onto the links. Thus, packets are buffered, routed, and forwarded at the same time they are crossing the link. Distributing a modular switch onto the link improves the trade off between the power consumption and the operating frequency of the entire network. On the contrary, area resources are increased. Additionally, the distributed switch presents better fault tolerance and process variation behavior with respect to a non-distributed switch.


IEEE Computer Architecture Letters | 2011

Fault-Tolerant Vertical Link Design for Effective 3D Stacking

Carles Hernandez; Antoni Roca; Jose Flich; Federico Silla; José Duato

Recently, 3D stacking has been proposed to alleviate the memory bandwidth limitation arising in chip multiprocessors (CMPs). As the number of integrated cores in the chip increases the access to external memory becomes the bottleneck, thus demanding larger memory amounts inside the chip. The most accepted solution to implement vertical links between stacked dies is by using Through Silicon Vias (TSVs). However, TSVs are exposed to misalignment and random defects compromising the yield of the manufactured 3D chip. A common solution to this problem is by over-provisioning, thus impacting on area and cost. In this paper, we propose a fault-tolerant vertical link design. With its adoption, fault-tolerant vertical links can be implemented in a 3D chip design at low cost without the need of adding redundant TSVs (no over-provision). Preliminary results are very promising as the fault-tolerant vertical link design increases switch area only by 6.69% while the achieved interconnect yield tends to 100%.


digital systems design | 2010

A Latency-Efficient Router Architecture for CMP Systems

Antoni Roca; Jose Flich; Federico Silla; José Duato

As technology advances, the number of cores in Chip Multi Processor systems (CMPs) and Multi Processor Systems-on-Chips (MPSoCs) keeps increasing. Current test chips and products reach tens of cores, and it is expected to reach hundreds of cores in the near future. Such complexity demands for an efficient network-on-chip (NoC). The common choice to build such networks is the 2D mesh topology (as it matches the regular tile-based design) and the Dimension-Order Routing (DOR) algorithm (because its simplicity). The network in such systems must provide sustained throughput and ultra low latencies. One of the key components in the network is the router, and thus, it plays a major role when designing for such performance levels. In this paper we propose a new pipelined router design focused in reducing the router latency. As a first step we identify the router components that take most of the critical path, and thus limit the router frequency. In particular, the arbiter is the one limiting the performance of the router. Based on this fact, we simplify the arbiter logic by using multiple smaller arbiters. The initial set of requests in the initial arbiter is then distributed over the smaller arbiters that operate in parallel. With this design procedure, and with a proper internal router organization, different router architectures are evolved. All of them enable the use of smaller arbiters in parallel by replicating ports and assuming the use of the DOR algorithm. The net result of such changes is a faster router. Preliminary results demonstrate a router latency reduction ranging from 10% to 21% with an increase of the router area. Network latency is reduced in a range from 11% to 15%.


Proceedings of SPIE | 2009

Improvements on image authentication and recovery using distributed source coding

Nitin Khanna; Antoni Roca; George T.-C. Chiu; Jan P. Allebach; Edward J. Delp

This paper investigates the performance and proposes modifications to earlier methods for image authentication using distributed source coding. This approach works well on images that have undergone affine geometric transformations such as rotation and resizing and intensity transformations such as contrast and brightness adjustment. The results show that the improvements proposed here can be used to make the original scheme for image authentication robust to affine geometric and intensity transformations. The modifications are of much lesser computational complexity when compared with other schemes for estimation of channel parameters.

Collaboration


Dive into the Antoni Roca's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

José Duato

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar

Federico Silla

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar

Josep Prades-Nebot

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carles Hernandez

Barcelona Supercomputing Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Samuel Rodrigo

Simula Research Laboratory

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge