Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Castells-Rufas is active.

Publication


Featured researches published by David Castells-Rufas.


parallel, distributed and network-based processing | 2008

xENoC - An eXperimental Network-On-Chip Environment for Parallel Distributed Computing on NoC-based MPSoC Architectures

Jaume Joven; Oriol Font-Bach; David Castells-Rufas; Ricardo Martínez; Lluís Terés; Jordi Carrabina

This paper describes xENoC, an automatic and component re-use HW-SW environment to build simulatable and synthesizable Network-on-Chip-based MPSoC architectures. xENoC is based on a tool, named NoCWizard, which uses an eXtensible Markup Language (XML) specification, and a set of modularized components and templates to generate many types of NoC instances by using Verilog HDL. This NoC models can be customized in terms of topology, tile location/mapping, RNIs generation, different types of routers, FIFO and packet/flit sizes, by simply modifying the XML specifications. Furthermore, xENoC is also composed of software components, i.e. RNI drivers and a parallel programming model, embedded Message Passing Interface (eMPI), which let us to carry out a complete HW-SW co-design methodology to design distributed-memory NoC-based MPSoCs parallel applications. Through xENoC different distributed-memory NoC-based MPSoCs designs have been created simulated and prototyped in physical platforms (e.g. FPGA boards), and some parallel multiprocessor test traffic applications are running there as system level demonstrators.


Biomedical Signal Processing and Control | 2015

Simple real-time QRS detector with the MaMeMi filter

David Castells-Rufas; Jordi Carrabina

Abstract Detection of QRS complexes in ECG signals is required to determine heart rate, and it is an important step in the study of cardiac disorders. ECG signals are usually affected by noise of low and high frequency. To improve the accuracy of QRS detectors several methods have been proposed to filter out the noise and detect the characteristic pattern of QRS complex. Most of the existing methods are at a disadvantage from relatively high computational complexity or high resource needs making them less optimized for its implementation on portable embedded systems, wearable devices or ultra-low power chips. We present a new method to detect the QRS signal in a simple way with minimal computational cost and resource needs using a novel non-linear filter.


international symposium on system-on-chip | 2006

A Validation And Performance Evaluation Tool for ProtoNoC

David Castells-Rufas; Jaume Joven; Jordi Carrabina

Simulating a NoC at the RTL level can be extremely complex, the simulation of a relatively small NoC, such as a 4 times 4 mesh, can involve observing thousands of wires on a standard HDL simulator. The facilities of JHDL to extend the simulator environment together with the possibility to fully analyze the runtime object model of the circuit offers a great opportunity to develop modules that address complex features like high level validation and performance evaluation. We present a developed tool that allows defining a NoC architecture models with some flexibility. Traffic generation processes described with high level language can be added to the model. Simulation can be used to validate the system operation on realistic conditions and get accurate values of expected performance


field-programmable custom computing machines | 2007

Jumble: A Hardware-in-the-Loop Simulation System for JHDL

David Castells-Rufas; Jordi Carrabina

This paper presents a new verification system for FPGA based designs described in the JHDL hardware description language. The method consists of performing hardware emulation of designer selected blocks in a co-simulation environment. Although JHDL has a Hardware execution mode it does not provide a fine control of which blocks have to be executed in Hardware and it is based on Xilinx readback technology. In this paper we present a method to extend the simulation environment to add a fine control of the Hardware emulation system, a method to instrument the design for debug, and the process that automatically creates the interface to communicate the simulator with the emulated hardware block. The resulting system does not offer 100% observability and controllability of hardware blocks. Nevertheless its interactivity provides a solid basis for incremental verification while offering the possibility of substantial simulation speedups.


international symposium on industrial embedded systems | 2011

HW-SW implementation of a decoupled FPU for ARM-based Cortex-M1 SoCs in FPGAs

Jaume Joven; Per Strict; David Castells-Rufas; Akash Bagdia; Giovanni De Micheli; Jordi Carrabina

Nowadays industrial monoprocessor and multiprocessor systems make use of hardware floating-point units (FPUs) to provide software acceleration and better precision due to the necessity to compute complex software applications. This paper presents the design of an IEEE-754 compliant FPU, targeted to be used with ARM Cortex-M1 processor on FPGA SoCs. We face the design of an AMBA-based decoupled FPU in order to avoid changing of the Cortex-M1 ARMv6-M architecture and the ARM compiler, but as well to eventually share it among different processors in our Cortex-M1 MPSoC design. Our HW-SW implementation can be easily integrated to enable hardware-assisted floating-point operations transparently from the software application. This work reports synthesis results of our Cortex-M1 SoC architecture, as well as our FPU in Altera and Xilinx FPGAs, which exhibit competitive numbers compared to the equivalent Xilinx FPU IP core. Additionally, single and double precision tests have been performed under different scenarios showing best case speedups between 8.8× and 53.2× depending on the FP operation when are compared to FP software emulation libraries.


international conference on parallel processing | 2010

Scalability of a Parallel JPEG Encoder on Shared Memory Architectures

David Castells-Rufas; Jaume Joven; Jordi Carrabina

Embedded multimedia systems are expected to fully embrace the future many-core wave. As a consequence parallel programming is being revamped as the only way to exploit the power of coming chips. While waiting for them we try to extrapolate some lessons learned from current multi-cores to influence future architectures and programming methods. In this paper we investigate the parallelism and scalability of a JPEG image encoder, which is a typical embedded application, on several shared memory machines using the OpenMP programming framework. We identify the Huffman coding as the bottleneck that blocks the application from scaling above a 7x factor. We propose a strategy to parallelize the Huffman coding, which introduces a small degradation in some parts of the image, allowing to reach higher speedup factors. A factor of 18.8x has been reached in SGI Altix 4700 using 22 threads. Contrasting these results with some previous works using message passing architectures we consider that the use of OpenMP on top of shared memory architectures should be reconsidered for future chips in favor of message passing architectures and programming models.


arXiv: Distributed, Parallel, and Cluster Computing | 2016

Energy Efficiency of Many-Soft-Core Processors.

David Castells-Rufas; Albert Saà-Garriga; Jordi Carrabina

The growing capacity of integration allows to instantiate hundreds of soft-core processors in a single FPGA to create a reconfigurable multiprocessing system. Lately, FPGAs have been proven to give a higher energy efficiency than alternative platforms like CPUs and GPGPUs for certain workloads and are increasingly used in data-centers. In this paper we investigate whether many-soft-core processors can achieve similar levels of energy efficiency while providing a general purpose environment, more easily programmed, and allowing to run other applications without reconfiguring the device. With a simple application example we are able to create a reconfigurable multiprocessing system achieving an energy efficiency 58 times higher than a recent ultra-low-power processor and 124 times higher than a recent high performance GPGPU.


Archive | 2012

Survey of NoC and Programming Models Proposals for MPSoC

Eduard Fernandez-Alonso; David Castells-Rufas; Jaume Joven; Jordi Carrabina; Edifici Enginyeria


Proceedings of the 2nd Workshop on Programming Models for Emerging Architectures (PMEA"10) | 2010

QoS-ocMPI: QoS-aware on-chip Message Passing Library for NoC-based Many-Core MPSoCs

Jaume Joven; Federico Angiolini; David Castells-Rufas; Giovanni De Micheli; Jordi Carrabina


Archive | 2009

NocMaker: A cross-platform open-source design space exploration tool for networks on chip

David Castells-Rufas; Juan AguarOn Joven; Sergi Risueno; Edouard Fernandez; Jordi Carrabina

Collaboration


Dive into the David Castells-Rufas's collaboration.

Top Co-Authors

Avatar

Jordi Carrabina

Autonomous University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Jaume Joven

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Albert Saà-Garriga

Autonomous University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Eduard Fernandez-Alonso

Autonomous University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Giovanni De Micheli

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Sergi Risueno

Autonomous University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Federico Angiolini

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Lluís Terés

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge