Is this you? Create Your Porfile

Jaume Joven

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaume Joven is active.

Explore More

Publication

Featured researches published by Jaume Joven.

parallel, distributed and network-based processing | 2008

xENoC - An eXperimental Network-On-Chip Environment for Parallel Distributed Computing on NoC-based MPSoC Architectures

Jaume Joven; Oriol Font-Bach; David Castells-Rufas; Ricardo Martínez; Lluís Terés; Jordi Carrabina

This paper describes xENoC, an automatic and component re-use HW-SW environment to build simulatable and synthesizable Network-on-Chip-based MPSoC architectures. xENoC is based on a tool, named NoCWizard, which uses an eXtensible Markup Language (XML) specification, and a set of modularized components and templates to generate many types of NoC instances by using Verilog HDL. This NoC models can be customized in terms of topology, tile location/mapping, RNIs generation, different types of routers, FIFO and packet/flit sizes, by simply modifying the XML specifications. Furthermore, xENoC is also composed of software components, i.e. RNI drivers and a parallel programming model, embedded Message Passing Interface (eMPI), which let us to carry out a complete HW-SW co-design methodology to design distributed-memory NoC-based MPSoCs parallel applications. Through xENoC different distributed-memory NoC-based MPSoCs designs have been created simulated and prototyped in physical platforms (e.g. FPGA boards), and some parallel multiprocessor test traffic applications are running there as system level demonstrators.

Philosophical Transactions of the Royal Society A | 2014

On the use of inexact, pruned hardware in atmospheric modelling

Peter D. Düben; Jaume Joven; Avinash Lingamneni; Hugh McNamara; Giovanni De Micheli; Krishna V. Palem; T. N. Palmer

Inexact hardware design, which advocates trading the accuracy of computations in exchange for significant savings in area, power and/or performance of computing hardware, has received increasing prominence in several error-tolerant application domains, particularly those involving perceptual or statistical end-users. In this paper, we evaluate inexact hardware for its applicability in weather and climate modelling. We expand previous studies on inexact techniques, in particular probabilistic pruning, to floating point arithmetic units and derive several simulated set-ups of pruned hardware with reasonable levels of error for applications in atmospheric modelling. The set-up is tested on the Lorenz ‘96 model, a toy model for atmospheric dynamics, using software emulation for the proposed hardware. The results show that large parts of the computation tolerate the use of pruned hardware blocks without major changes in the quality of short- and long-time diagnostics, such as forecast errors and probability density functions. This could open the door to significant savings in computational cost and to higher resolution simulations with weather and climate models.

international symposium on system-on-chip | 2006

A Validation And Performance Evaluation Tool for ProtoNoC

David Castells-Rufas; Jaume Joven; Jordi Carrabina

Simulating a NoC at the RTL level can be extremely complex, the simulation of a relatively small NoC, such as a 4 times 4 mesh, can involve observing thousands of wires on a standard HDL simulator. The facilities of JHDL to extend the simulator environment together with the possibility to fully analyze the runtime object model of the circuit offers a great opportunity to develop modules that address complex features like high level validation and performance evaluation. We present a developed tool that allows defining a NoC architecture models with some flexibility. Traffic generation processes described with high level language can be added to the model. Simulation can be used to validate the system operation on realistic conditions and get accurate values of expected performance

international conference on hardware/software codesign and system synthesis | 2010

Exploring programming model-driven QoS support for NoC-based platforms

Jaume Joven; Andrea Marongiu; Federico Angiolini; Luca Benini; Giovanni De Micheli

Networks-on-Chip (NoCs) are being increasingly considered as a central enabling technology to communication-centric designs as more and more IP blocks are integrated on the same SoC. Embedded applications, in turn, are becoming extremely sophisticated, and often require guaranteed levels of service and performance. The complex and non-uniform nature of network traffic generated by parallel applications running on a large number of possibly heterogeneous IPs makes a strong case for providing Quality of Service (QoS) support for traffic streams over the NoC infrastructure. In this paper we consider an integrated hardware/software approach for delivering QoS at the application level. We designed NoC hardware support, low-level middleware and APIs which enable QoS control at the application level. Furthermore, we identify a set of programming abstractions useful to associate the notion of priority to each running task in the system. An initial implementation of this programming model is also presented, which leverages a set of extensions to a MPSoC-specific OpenMP compiler and run-time environment.

parallel computing | 2013

An integrated, programming model-driven framework for NoC-QoS support in cluster-based embedded many-cores

Jaume Joven; Andrea Marongiu; Federico Angiolini; Luca Benini; G. De Micheli

Embedded SoC designs are embracing the many-core paradigm to deliver the required performance to run an ever-increasing number of applications in parallel. Networks-on-Chip (NoC) are considered as a convenient technology to implement many-core embedded platforms. The complex and non-uniform nature of the traffic flows generated when multiple parallel applications are running simultaneously calls for Quality-of-Service (QoS) extensions in the NoC, but to efficiently exploit similar services it is necessary to expose them to the software in a easy-to-use yet efficient manner. In this paper we present an integrated hardware/software approach for delivering QoS on top of an hybrid OpenMP-MPI parallel programming model. Our experimental results show the effectiveness of our proposal over a broad range of benchmarks and application mappings, demonstrating the ability to manage parallelism under QoS requirements effortlessly from the programming model.

international symposium on industrial embedded systems | 2011

HW-SW implementation of a decoupled FPU for ARM-based Cortex-M1 SoCs in FPGAs

Jaume Joven; Per Strict; David Castells-Rufas; Akash Bagdia; Giovanni De Micheli; Jordi Carrabina

Nowadays industrial monoprocessor and multiprocessor systems make use of hardware floating-point units (FPUs) to provide software acceleration and better precision due to the necessity to compute complex software applications. This paper presents the design of an IEEE-754 compliant FPU, targeted to be used with ARM Cortex-M1 processor on FPGA SoCs. We face the design of an AMBA-based decoupled FPU in order to avoid changing of the Cortex-M1 ARMv6-M architecture and the ARM compiler, but as well to eventually share it among different processors in our Cortex-M1 MPSoC design. Our HW-SW implementation can be easily integrated to enable hardware-assisted floating-point operations transparently from the software application. This work reports synthesis results of our Cortex-M1 SoC architecture, as well as our FPU in Altera and Xilinx FPGAs, which exhibit competitive numbers compared to the equivalent Xilinx FPU IP core. Additionally, single and double precision tests have been performed under different scenarios showing best case speedups between 8.8× and 53.2× depending on the FP operation when are compared to FP software emulation libraries.

international conference on electronics, circuits, and systems | 2010

A NoC-based multi-{soft}core with 16 cores

Eduard Fernandez-Alonso; David Castells-Rufas; Sergi Risueno; Jordi Carrabina; Jaume Joven

The number of resources available in the largest reconfigurable devices enables the synthesis of systems with more than 100 Soft-Core processors. Although a feasible and attainable option, few such systems have been built and few published works propose methods to create, program and optimize this kind of system. In this work we present a methodology that helps the designer to build Multi Processor System on Chip (MPSoC) systems based on Network on Chip (NoC) interconnection, and also to develop applications and to analyze their performance. In this methodology several tools have been combined such as SOPC builder from Altera, NoCMaker from UAB, and Vampir from GWT-TUD. As a result, a NoC-based MPSoC consisting of 16 NIOS II Soft-Core processors has been built and successfully tested with several applications.

international conference on parallel processing | 2010

Scalability of a Parallel JPEG Encoder on Shared Memory Architectures

David Castells-Rufas; Jaume Joven; Jordi Carrabina

Embedded multimedia systems are expected to fully embrace the future many-core wave. As a consequence parallel programming is being revamped as the only way to exploit the power of coming chips. While waiting for them we try to extrapolate some lessons learned from current multi-cores to influence future architectures and programming methods. In this paper we investigate the parallelism and scalability of a JPEG image encoder, which is a typical embedded application, on several shared memory machines using the OpenMP programming framework. We identify the Huffman coding as the bottleneck that blocks the application from scaling above a 7x factor. We propose a strategy to parallelize the Huffman coding, which introduces a small degradation in some parts of the image, allowing to reach higher speedup factors. A factor of 18.8x has been reached in SGI Altix 4700 using 22 threads. Contrasting these results with some previous works using message passing architectures we consider that the use of OpenMP on top of shared memory architectures should be reconsidered for future chips in favor of message passing architectures and programming models.

Computers & Electrical Engineering | 2012

Development process for clusters on a reconfigurable chip

Eduard Fernandez-Alonso; David Castells-Rufas; Jaume Joven; Jordi Carrabina

Reconfigurable MPSoCs (Multiprocessor System-on-Chip) could be viable for certain applications niche where the flexibility of FPGAs (Field-Programmable Gate Array) and software is needed, and a small number of units dismiss other silicon options. However, their design complexity is very high, and raises additional problems, i.e. the definition of a suitable programming model, an efficient memory organization, and the need for ways to optimize application performance. In this paper, we propose a complete development process, which addresses these problems by complementing the current SoC (System-on-Chip) development process with additional steps to support parallel programming and software optimization. This work explains systematically problems and solutions to achieve a FPGA-based MPSoC following our systematic flow and offering tools and techniques to develop parallel applications for such systems.

field-programmable technology | 2011

Sharing FPUs in many-soft-cores

David Castells-Rufas; Eduard Fernandez-Alonso; Jordi Carrabina; Jaume Joven

Modern top of the line FPGAs can already host hundreds of simple soft-core processors. Because soft-cores often support floating point units through external interfaces this opens the door to explore the convenience for sharing the floating point units among a number of processors in many-soft-cores. We build two variants of a many-soft-core with 16 NIOSII cores to test if sharing the FPU gives an important area reduction and to test if the introduced time overhead is significant. We find out that area savings are a 30% of the non-shared FPU version for a 16 core system and that the overhead in clock cycles is almost inexistent for simple applications like matrix multiplication and below 2% for a parallel Mandelbrot application. However, if we consider the reduction of the maximum operational frequency that happens when the number of processors increase, we get that sharing among 8 processors is a very good option, and that it is not advisable to share among more than 12 processors because of the excessive time overhead

Explore More