Is this you? Create Your Porfile

David Novo

Katholieke Universiteit Leuven

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Novo is active.

Explore More

Publication

Featured researches published by David Novo.

international conference on communications | 2008

Selective Spanning with Fast Enumeration: A Near Maximum-Likelihood MIMO Detector Designed for Parallel Programmable Baseband Architectures

Min Li; Bruno Bougard; Eduardo Lopez; A. Bourdoux; David Novo; L. Van der Perre; Francky Catthoor

ML and near-ML MIMO detectors have attracted a lot of interest in recent years. However, almost all of the reported implementations are delivered in ASIC or FPGA. Our contribution is to co-optimize the near-ML MIMO detector algorithm and implementation for parallel programmable base-band architectures, such as DSPs with VLIW, SIMD or vector processing features. Although for hardware the architecture can be tuned to fit algorithms, for programmable platforms the algorithm must be elaborately designed to fit the given architecture, so that efficient resource-utilizations can be achieved. By thoroughly analyzing and exploiting the interaction between algorithms and architectures, we propose the SSFE (selective spanning with fast enumeration) as an architecture-friendly near-ML MIMO detector. The SSFE has a distributed and greedy algorithmic structure that brings a completely deterministic and regular dataflow. The SSFE has been evaluated for coded OFDM transmissions over 802.11n channels and 3GPP channels. Under the same performance constraints, the complexity of the SSFE is significantly lower than the K-Best, the most popular detector implemented in hardware. More importantly, SSFE can be easily parallelized and efficiently mapped on programmable baseband architectures. With TI TMS320C6416, the SSFE delivers 37.4 - 125.3 Mbps throughput for 4x4 64 QAM transmissions. To the best of our knowledge, this is the first reported near-ML MIMO detector explicitly designed for parallel programmable architectures and demonstrated on a real-life platform.

design, automation, and test in europe | 2008

A coarse-grained array based baseband processor for 100Mbps+ software defined radio

Bruno Bougard; Bjorn De Sutter; Sebastien Rabou; David Novo; Osman Allam; Steven Dupont; Liesbet Van der Perre

The software-defined radio (SDR) concept aims to enabling cost-effective multi-mode baseband solutions for wireless terminals. However, the growing complexity of new communication standards applying, e.g., multi-antenna transmission techniques, together with the reduced energy budget, is challenging SDR architectures. Coarse-grained array (CGA) processors are strong candidates to undertake both high performance and low power. The design of a candidate hybrid CGA-SEVID processor for an SDR baseband platform is presented. The processor, designed in TSMC 90 G process according to a dual-VT standard-cells flow, achieves a clock frequency of 400 MHz in worst case conditions and consumes maximally 310 mW active and 25 mW leakage power (typical conditions) when delivering up to 25,6 GOPS (16-bit). The mapping of a 20 MHz 2times2 MIMO-OFDM transmit and receive baseband functionality is detailed as an application case study, achieving 100 Mbps+ throughput with an average consumption of 220 mW.

design, automation, and test in europe | 2008

Optimizing near-ML MIMO detector for SDR baseband on parallel programmable architectures

Min Li; Bruno Bougard; Weiyu Xu; David Novo; Liesbet Van der Perre; Francky Catthoor

ML and near-ML MIMO detectors have attracted a lot of interest in recent years. However, almost all the reported implementations are delivered in ASICs or FPGAs. Our contribution is optimizing the near-ML MIMO detector for parallel programmable architectures, such as those with ILP and DLP features. In the proposed SSFE (selective spanning with fast enumeration), architecture-friendliness is explicitly introduced from the very beginning of the design flow. Importantly, high level algorithmic transformations make the dataflow pattern and structure fit architecture-characteristics very well. We enable abundant vector-parallelism with highly regular and deterministic dataflow in the SSFE; memory rearrangements, shuffling and non-predictable dynamism are all elaborately excluded. Hence, the SSFE can be easily parallelized and efficiently mapped onto ILP and DLP architectures. Furthermore, to fine-tune the SSFE on parallel architectures, extensive pre-compiler transformations are applied with the help of the application-level information. These optimize not only computation-operations but also address-generations and memory-accesses. Experiments show that the SSFE brings very efficient resource-utilizations on real-life VLIW architectures. Specifically, with the SSFE the percentage of NOPs instructions on VLIW is below 1%, even better than that achieved by the software-pipelined FFT. To the best of our knowledge, this is the first reported work about comprehensive optimizations of near-ML MIMO detectors for parallel programmable architectures.

signal processing systems | 2005

Mapping a multiple antenna SDM-OFDM receiver on the ADRES coarse-grained reconfigurable processor

David Novo; Will Moffat; Veerle Derudder; Bruno Bougard

The increasing demand for multimodal wireless communication is driving designers towards software defined radio (SDR). Therefore, new high performance reconfigurable platforms for baseband digital signal processing are required. Due to their flexibility, with low reconfiguration overhead, performance and energy efficiency, coarse grain reconfigurable arrays (CGRAs) are good candidates to fulfil this need. ADRES is a CGRA that combines a VLIW processor with a reconfigurable coarse-grain array. In this paper, we analyze the mapping on ADRES of one of the most demanding wireless OFDM DSP algorithms: the space division multiplexing (SDM) receiver. The latter will probably be mandatory in the next WLAN generation (802.11n). We also compare the obtained results with a mapping onto a VLIW processor, showing a gain of 5 in performance and a factor 1.75 in power efficiency.

international conference on systems | 2009

Novel energy-efficient scalable soft-output SSFE MIMO detector architectures

Robert Fasthuber; Min Li; David Novo; Praveen Raghavan; Liesbet Van der Perre; Francky Catthoor

Energy-efficient scalable soft-output signal detectors are of significant interest in emerging Multiple-Input Multiple-Output (MIMO) wireless communication systems. However, traditional high-performance MIMO detectors consume a rather high amount of power, are typically constraint to one modulation scheme and are not scalable with the number of antennas. Hence, they are not well-suited for future energy-efficient Software Defined Radio (SDR) platforms. This paper presents two energy-efficient scalable MIMO detector architectures: one optimized for high throughput, one for low area. Both architectures support 16-QAM as well as 64-QAM while offering soft-output and near-ML performance. The 2×2 high-throughput architecture was implemented in CMOS 65nm technology and subsequently scaled to 4×4 and 8×8. The 4×4 instance provides up to 300Mbps throughput while consuming only 0.3mm2 area and 28mW power. The 8×8 instance offers a throughput 10× better than the state-of-the-art while consuming 2/3 less power. Thus, the proposed near-ML Selective Spanning with Fast Enumeration (SSFE) based detector architectures are not only multi-standard capable and scalable, they are also highly efficient.

international conference on embedded computer systems architectures modeling and simulation | 2007

Design of a low power pre-synchronization ASIP for multimode SDR terminals

T. Schuster; Bruno Bougard; Praveen Raghavan; Robert Priewasser; David Novo; Liesbet Van der Perre; Francky Catthoor

SDR enables cost-effective multi-mode terminals but still suffers from significant energy penalty when compared to dedicated hardware solutions. At system level, this energy bottleneck can be leveraged capitalizing on heterogeneous MPSOC platforms where specific engines are dedicated to classes of functions with similar computation characteristics and duty cycle. In burst-based communication as in IEEE802.11 or IEEE802.16, burst detection functions have high duty cycle and hence need an ultra low power implementation. Besides, programmability must be preserved to support multiplemodes. A low-power pre-synchronization ASIP is designed targeting the IEEE802.11a/g/n and IEEE802.16e synchronization at 20MHz input rate. Power simulations at gate-level show that an IEEE802.16e synchronization (20MHz) can be carried out with an average power of 15.86mW. This corresponds to an effective energy efficiency of 115.89MOPS/mW (32-bit equivalent operations).

signal processing systems | 2009

Energy-performance Exploration of a CGA-based SDR Processor

David Novo; T. Schuster; Bruno Bougard; Andy Lambrechts; Liesbet Van der Perre; Francky Catthoor

Software-Defined Radio (SDR) provides the flexibility to enable cost-effective multi-mode terminals. However, the growing complexity of the new communication standards, which need to be executed with the reduced energy budget required by battery-powered devices, is still challenging architects. Although Coarse Grain Array (CGA) -based processors extended with domain specific instructions are considered strong candidates to undertake both the high-performance and low power, the lack of efficient methodologies to derive optimal instances of such an architecture paradigm is still a major limitation. In this paper, an extensive energy-performance exploration of a CGA-based SDR processor is presented. This approach targets sufficient relative accuracy on the optimization metrics, which assures meaningful comparisons between different instances, while the absolute accuracy is relaxed and traded off against simulation time. The balance between the different sources of architectural parallelism, such as data and instruction level parallelism is crucial in order to achieve the required performance at minimum energy cost. Accordingly, the proposed method is used to select the optimal DLP–ILP combination required to run the symbol-based baseband processing of a 100 Mbps+ WLAN (Wireless Local Area Network) receiver in a CGA-based processor. As a result, a 4 × 4 array with four ways SIMD (Single Instruction, Multiple Data) extensions is shown to be the optimal instance, providing minimum energy consumption and real-time processing guarantees.

design, automation, and test in europe | 2008

Scenario-based fixed-point data format refinement to enable energy-scalable software defined radios

David Novo; Bruno Bougard; Andy Lambrechts; L. Van der Perre; Francky Catthoor

User demand, standards and products for digital nomadic communications are evolving quickly. The combination of this changing environment together with the need for short time-to-market pushes for more flexible implementations. Software Defined Radios (SDR) have been introduced as the ultimate way to achieve such flexibility. The reduced energy budget required by battery-powered solutions makes the typical worst-case static dimensioning unaffordable under highly dynamic operating conditions. Instead, more energy-scalable algorithms and implementations are entailed to provide flexibility while maintaining the required energy efficiency. Particularly, energy-scalable implementations can exploit data format properties to offer different tradeoffs between accuracy and energy. In this paper, such a technique is developed and applied to the SDR implementation of a 2 antennas 200 Mbps+ OFDM (Orthogonal Frequency-Division Multiplexing) inner modem receiver on a C-programmable CGA (Coarse Grain Array) processor with extensive SIMD (Single Instruction Multiple Data) support. By defining separate implementations for different combinations of modulation scheme and coding rate, up to 3-fold gains can be achieved in the average energy consumption.

signal processing systems | 2008

An implementation friendly low complexity multiplierless LLR generator for soft MIMO sphere decoders

Min Li; David Novo; Bruno Bougard; Frederik Naessens; L. Van der Perre; Francky Catthoor

When combined with advanced FEC techniques such as the turbo code and LDPC code, soft-output MIMO sphere decoders significantly outperform hard-output sphere decoders. Hence, algorithms and implementations of soft-output sphere decoders have attracted intensive interest in recent years. Practical soft-output sphere decoder implementations often consist of a list generator and a LLR generator. Most existing implementations focus on the list generator, and the LLR generator is implemented in a relatively straightforward way. However, the LLR generator accounts for a great part of the complexity. Our contribution is an implementation friendly low complexity multiplierless LLR generator. We apply selective and incremental updating, algebraic simplifications and strength reductions to reduce the algorithmic complexity and to eliminate all multiplications. When integrated with the SSFE list generator, our scheme not only remove 100% multiplications, but also remove 26% to 83% additions, 76% to 94% bit-shifts and 63% to 91% memory operations. Besides the algorithmic aspects, we extract the key data-flow block with well-defined control signals. This can be easily mapped onto micro-architectures and implemented as the data-path in ASICs, or a function unit in ASIPs.

signal processing systems | 2011

Exploration of Soft-Output MIMO Detector Implementations on Massive Parallel Processors

Robert Fasthuber; Min Li; David Novo; Praveen Raghavan; Liesbet Van der Perre; Francky Catthoor

Emerging Software Defined Radio (SDR) baseband platforms are based on multiple processors with massive parallelism. Although the computational power of these platforms would theoretically enable SDR solutions with advanced wireless signal processing, existing work implements still rather basic algorithms. For instance, current Multiple-Input Multiple-Output (MIMO) detector implementations are typically based on simple linear hard-output and not on advanced near-Maximum Likelihood (ML) soft-output detection. However, only the latter enables to exploit the full potential of MIMO technology. In this work, we explore the feasibility of advanced soft-output near-ML MIMO detectors on massive parallel processors. Although such detectors are considered to be very challenging due to their high computational complexity, we combine architecture-friendly algorithm design, application specific instructions and instruction-level/data-level parallelism explorations to make SDR solutions feasible. We show that, by applying the proposed combination of techniques, it is possible to obtain SDR implementations which can deliver data rates that are sufficient for future wireless systems. For example, a 2 × 4 Coarse Grain Array (CGA) processor with 16-way Single Instruction Multiple Data (SIMD) can deliver 192/368xa0Mbps throughput for 2 × 2 64/16-QAM transmissions. Finally, we estimate the area and power consumption of the programmable solution and compare it against a traditional Application Specific Integrated Circuit (ASIC) approach. This enables us to draw conclusions from the cost perspective.

Explore More