Is this you? Create Your Porfile

Razvan Nane

Delft University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Razvan Nane is active.

Explore More

Publication

Featured researches published by Razvan Nane.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2016

A Survey and Evaluation of FPGA High-Level Synthesis Tools

Razvan Nane; Vlad Mihai Sima; Christian Pilato; Jongsok Choi; Blair Fort; Andrew Canis; Yu Ting Chen; Hsuan Hsiao; Stephen Dean Brown; Fabrizio Ferrandi; Jason Helge Anderson; Koen Bertels

High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing todays system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

field programmable logic and applications | 2012

DWARV 2.0: A CoSy-based C-to-VHDL hardware compiler

Razvan Nane; Vlad Mihai Sima; Bryan Olivier; Roel Meeuws; Yana Yankova; Koen Bertels

In the last decade, a considerable amount of effort was spent on raising the implementation level of hardware systems by automatically extracting the parallelism from input applications and using tools to generate Hardware/Software co-design solutions. However, the tools developed thus far either focus on particular application domains or they impose severe restrictions on the input language. In this paper, we present the DWARV 2.0 compiler that accepts general C-code as input and generates synthesizable VHDL for unrestricted application domains. Dissimilar to previous hardware compilers, this implementation is based on CoSy compiler framework. This allowed us to build a highly modular compiler in which standard or custom optimizations can be easily integrated. Validation experiments showed speed-ups of up to 4.41× when comparing against another state of the art hardware compiler.

Reconfigurable Computing-From FPGAs to Hardware/Software Codesign. Ed.: J. M. P. Cardoso | 2011

REFLECT: Rendering FPGAs to Multi-core Embedded Computing

João M. P. Cardoso; Pedro C. Diniz; Zlatko Petrov; Koen Bertels; Michael Hübner; Hans van Someren; Fernando M. Gonçalves; José Gabriel F. Coutinho; George A. Constantinides; Bryan Olivier; Wayne Luk; Juergen Becker; Georgi Kuzmanov; Florian Thoma; Lars Braun; Matthias Kühnle; Razvan Nane; Vlad Mihai Sima; Kamil Krátký; José Carlos Alves; João Canas Ferreira

The relentless increase in capacity of Field-Programmable Gate-Arrays (FPGAs) has made them vehicles of choice for both prototypes and final products requiring on-chip multi-core, heterogeneous and reconfigurable systems. Multiple cores can be embedded as hard- or soft-macros, have customizable instruction sets, multiple distributed RAMs and/or configurable interconnections. Their flexibility allows them to achieve orders of magnitude better performance than conventional computing systems via customization. Programming these systems, however, is extremely cumbersome and error-prone and as a result their true potential is only achieved very often at unreasonably high design efforts. This project covers developing, implementing and evaluating a novel compilation and synthesis system approach for FPGA-based platforms. We rely on Aspect-Oriented Specifications to convey critical domain knowledge to a mapping engine while preserving the advantages of a high-level imperative programming paradigm in early software development as well as program and application portability. We leverage Aspect-Oriented specifications and a set of transformations to generate an intermediate representation suitable to hardware mapping. A programming language, LARA, will allow the exploration of alternative architectures and design patterns enabling the generation of flexible hardware cores that can be easily incorporated into larger multi-core designs. We will evaluate the effectiveness of the proposed approach using partner-provided codes from the domain of audio processing and real-time avionics. We expect the technology developed in REFLECT to be integrated by our industrial partners, in particular by ACE, a leading compilation tool supplier for embedded systems, and by Honeywell, a worldwide solution supplier of embedded high-performance systems.

international symposium on nanoscale architectures | 2015

Computation-in-memory based parallel adder

Hoang Anh Du Nguyen; Lei Xie; Mottaqiallah Taouil; Razvan Nane; Said Hamdioui; Koen Bertels

Todays computing systems suffer from memory/communication bottleneck, resulting in energy and performance inefficiency. This makes them incapable to solve data-intensive applications within economically acceptable limits. Computation-In-Memory (CIM) architecture, based on the integration of storage and computation in the same physical location using non-volatile memristor technology offers a potential solution for the memory bottleneck. This paper presents a CIM based parallel adder, and shows its potentials and superiority for intensive computing and massive parallelism by comparing it with state-of-the art computing systems including multicore, GPU and FPGA architecture. The results show that CIM based parallel adder can achieve at least two orders of magnitude improvement in computational efficiency, energy efficiency and area efficiency.

Microprocessors and Microsystems | 2013

Controlling a complete hardware synthesis toolchain with LARA aspects

João M. P. Cardoso; Tiago Carvalho; José Gabriel F. Coutinho; Ricardo Nobre; Razvan Nane; Pedro C. Diniz; Zlatko Petrov; Wayne Luk; Koen Bertels

The synthesis and mapping of applications to configurable embedded systems is a notoriously complex process. Design-flows typically include tools that have a wide range of parameters which interact in very unpredictable ways, thus creating a large and complex design space. When exploring this space, designers must manage the interfaces between different tools and apply, often manually, a sequence of tool-specific transformations making design exploration extremely cumbersome and error-prone. This paper describes the use of techniques inspired by aspect-oriented technology and scripting languages for defining and exploring hardware compilation strategies. In particular, our approach allows developers to control all stages of a hardware/software compilation and synthesis toolchain: from code transformations and compiler optimizations to placement and routing for tuning the performance of application kernels. Our approach takes advantage of an integrated framework which provides a transparent and unified view over toolchains, their data output and the control of their execution. We illustrate the use of our approach when designing application-specific hardware architectures generated by a toolchain composed of high-level source-code transformation and synthesis tools. The results show the impact of various strategies when targeting custom hardware and expose the complexities in devising these strategies, hence highlighting the productivity benefits of this approach.

international conference on high performance computing and simulation | 2016

Parallel matrix multiplication on memristor-based computation-in-memory architecture

Adib Haron; Jintao Yu; Razvan Nane; Mottaqiallah Taouil; Said Hamdioui; Koen Bertels

One of the most important constraints of todays architectures for data-intensive applications is the limited bandwidth due to the memory-processor communication bottleneck. This significantly impacts performance and energy. For instance, the energy consumption share of communication and memory access may exceed 80%. Recently, the concept of Computation-in-Memory (CIM) was proposed, which is based on the integration of storage and computation in the same physical location using a crossbar topology and non-volatile resistive-switching memristor technology. To illustrate the tremendous potential of CIM architecture in exploiting massively parallel computation while reducing the communication overhead, we present a communication-efficient mapping of a large-scale matrix multiplication algorithm on the CIM architecture. The experimental results show that, depending on the matrix size, CIM architecture exhibits several orders of magnitude higher performance in total execution time and two orders of magnitude better in total energy consumption than the multicore-based on the shared memory architecture.

ACM Transactions on Reconfigurable Technology and Systems | 2013

Quipu: A Statistical Model for Predicting Hardware Resources

Roel Meeuws; S. Arash Ostadzadeh; Carlo Galuzzi; Vlad Mihai Sima; Razvan Nane; Koen Bertels

There has been a steady increase in the utilization of heterogeneous architectures to tackle the growing need for computing performance and low-power systems. The execution of computation-intensive functions on specialized hardware enables to achieve substantial speedups and power savings. However, with a large legacy code base and software engineering experts, it is not at all obvious how to easily utilize these new architectures. As a result, there is a need for comprehensive tool support to bridge the knowledge gap of many engineers as well as to retarget legacy code. In this article, we present the Quipu modeling approach, which consists of a set of tools and a modeling methodology that can generate hardware estimation models, which provide valuable information for developers. This information helps to focus their efforts, to partition their application, and to select the right heterogeneous components. We present Quipu’s capability to generate domain-specific models, that are up to several times more accurate within their particular domain (error: 4.6%) as compared to domain-agnostic models (error: 23%). Finally, we show how Quipu can generate models for a new toolchain and platform within a few days.

IEEE Transactions on Very Large Scale Integration Systems | 2017

On the Implementation of Computation-in-Memory Parallel Adder

Hoang Anh Du Nguyen; Lei Xie; Mottaqiallah Taouil; Razvan Nane; Said Hamdioui; Koen Bertels

Today’s computer architectures suffer from many challenges, such as the near end of CMOS downscaling, the memory/communication bottleneck, the power wall, and the programming complexity. As a consequence, these architectures become inefficient in solving big data problems or general data intensive applications. Computation-in-memory (CIM) is a novel architecture that tries to solve/alleviate the impact of these challenges using the same device (i.e., the memristor) to implement the processor and memory in the same physical crossbar. In order to analyze its feasibility in depth, this paper proposes two memristor implementations of a data intensive arithmetic application (i.e., parallel addition). To the best of our knowledge, this is the first paper that considers the cost of the entire architecture including both crossbar and its CMOS controller. The results show that CIM architecture in general and the CIM parallel adder in particular have a high scalability. CIM parallel adder achieves at least two orders of magnitude improvement in energy and area in comparison with a multicore-based parallel adder. Moreover, due to a wide variety of memristor design methods (such as Boolean logic), tradeoffs can be made between the area, delay, and energy consumption.

embedded and ubiquitous computing | 2014

High-Level Synthesis in the Delft Workbench Hardware/Software Co-design Tool-Chain

Razvan Nane; Vlad Mihai Sima; Cuong Pham Quoc; Fernando M. Gonçalves; Koen Bertels

High-level synthesis (HLS) is an automated design process that deals with the generation of behavioral hardware descriptions from high-level algorithmic specifications. The main benefit of this approach is that ever-increasing system-on-chip (SoC) design complexity and ever-shorter time-to-market can still be both manageable and achievable. This advantage, coupled with the increasing number of available heterogeneous platforms that loosely couple general-purpose processors with Field Programmable Gate Array-based co-processors, led to an increasing attention for HLS tool development and optimization from both the academia as well as the industry. However, in order for HLS to fully reach its potential, it is imperative to look simultaneously at local HLS optimizations as well as to HLS system-level integration and design space exploration issues. In this paper, we present the Delft Workbench tool-chain that takes C-code as input and generates, in a semi-automatic way, a complete system. Subsequently, we describe the design and output code optimization of the DWARV 3.0 HLS compiler using the CoSy compiler framework. Based on this experience, we provide an overview of similarities and differences in leveraging this commercial compiler framework to build a hardware compiler as opposed to building a software compiler. Finally, we report speedups up to 3.72x at application level and development times measurable in hours rather than weeks.

international symposium on nanoscale architectures | 2016

Skeleton-based design and simulation flow for Computation-in-Memory architectures

Jintao Yu; Razvan Nane; Adib Haron; Said Hamdioui; Henk Corporaal; Koen Bertels

Memristor-based Computation-in-Memory is one of the emerging architectures proposed to deal with Big Data problems. The design of such architectures requires a radically new automatic design flow because the memristor is a passive device that uses resistance to encode its logic value. This paper proposes a design flow for mapping parallel algorithms on the CIM architecture. Algorithms with similar data flow graphs can be mapped on the crossbar using the same template containing scheduling, placement, and routing information; this template is named skeleton. By configuring such a skeleton with different pre-designed circuits, we can build CIM implementations of the corresponding algorithms in that class. This approach does not only map an algorithm on a memristor crossbar, but also gives an estimation of its performance, area, and energy consumption. It also supports user-defined constraints and parallel SystemC simulation. Experimental results demonstrate the feasibility and the potential of the approach.

Explore More