Kees A. Vissers | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kees A. Vissers is active.

Explore More

Publication

Featured researches published by Kees A. Vissers.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011

High-Level Synthesis for FPGAs: From Prototyping to Deployment

Jason Cong; Bin Liu; Stephen Neuendorffer; Juanjo Noguera; Kees A. Vissers; Zhiru Zhang

Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESLs AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.

design automation conference | 2000

YAPI: application modeling for signal processing systems

E.A. de Kock; Gerben Essink; W. J. M. Smits; R. van der Wolf; J.-Y. Brunei; Wido Kruijtzer; Paul Lieverse; Kees A. Vissers

We present a programming interface called YAPI to model signal processing applications as process networks. The purpose of YAPI is to enable the reuse of signal processing applications and the mapping of signal processing applications onto heterogeneous systems that contain hardware and software components. To this end, YAPI separates the concerns of the application programmer, who determines the functionality of the system, and the system designer, who determines the implementation of the functionality. The proposed model of computation extends the existing model of Kahn process networks with channel selection to support non-deterministic events. We provide an efficient implementation of YAPI in the form of a C++ run-time library to execute the applications on a workstation. Subsequently, the applications are used by the system designer as input for mapping and performance analysis in the design of complex signal processing systems. We evaluate this methodology on the design of a digital video broadcast system-on-chip.

signal processing systems | 1999

A methodology for architecture exploration of heterogeneous signal processing systems

Paul Lieverse; P. van der Wolf; Ed F. Deprettere; Kees A. Vissers

We present a methodology for the exploration of signal processing architectures at the system level. The methodology, named SPADE, provides a means to quickly build models of architectures at an abstract level, to easily map applications, modeled as Kahn Process Networks, onto these architecture models, and to analyze the performance of the resulting system by simulation. The methodology distinguishes between applications and architectures, and uses a trace-driven simulation technique for co-simulation of application models and architecture models. As a consequence, architecture models need not be functionally complete to be used for performance analysis while data dependent behavior is still handled correctly. We have used the methodology for the exploration of architectures and mappings of an MPEG-2 video decoder application.

field programmable gate arrays | 2004

A quantitative analysis of the speedup factors of FPGAs over processors

Zhi Guo; Walid A. Najjar; Frank Vahid; Kees A. Vissers

The speedup over a microprocessor that can be achieved by implementing some programs on an FPGA has been extensively reported. This paper presents an analysis, both quantitative and qualitative, at the architecture level of the components of this speedup. Obviously, the spatial parallelism that can be exploited on the FPGA is a big component. By itself, however, it does not account for the whole speedup.In this paper we experimentally analyze the remaining components of the speedup. We compare the performance of image processing application programs executing in hardware on a Xilinx Virtex E2000 FPGA to that on three general-purpose processor platforms: MIPS, Pentium III and VLIW. The question we set out to answer is what is the inherent advantage of a hardware implementation over a von Neumann platform. On the one hand, the clock frequency of general-purpose processors is about 20 times that of typical FPGA implementations. On the other hand, the iteration level parallelism on the FPGA is one to two orders of magnitude that on the CPUs. In addition to these two factors, we identify the efficiency advantage of FPGAs as an important factor and show that it ranges from 6 to 47 on our test benchmarks. We also identify some of the components of this factor: the streaming of data from memory, the overlap of control and data flow and the elimination of some instruction on the FPGA. The results provide a deeper understanding of the tradeoff between system complexity and performance when designing Configurable SoC as well as designing software for CSoC. They also help understand the one to two orders of magnitude in speedup of FPGAs over CPU after accounting for clock frequencies.

design, automation, and test in europe | 2005

Optimized Generation of Data-Path from C Codes for FPGAs

Zhi Guo; Betul Buyukkurt; Walid A. Najjar; Kees A. Vissers

FPGAs, as computing devices, offer significant speedup over microprocessors. Furthermore, their configurability offers an advantage over traditional ASICs. However, they do not yet enjoy high-level language programmability, as microprocessors do. This has become the main obstacle for their wider acceptance by application designers. ROCCC is a compiler designed to generate circuits from C source code to execute on FPGAs, more specifically on CSoCs. It generates RTL level HDLs from frequently executing kernels in an application. In this paper, we describe the ROCCCs system overview and focus on its data path generation. We compare the performance of ROCCC-generated VHDL code with that of Xilinx IPs. The synthesis result shows that the ROCCC-generated circuit takes around 2/spl times//spl sim/3/spl times/ the area and runs at a comparable clock rate.

field programmable gate arrays | 2017

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Yaman Umuroglu; Nicholas J. Fraser; Giulio Gambardella; Michaela Blott; Philip Heng Wai Leong; Magnus Jahre; Kees A. Vissers

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 μs latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 μs latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.

International Workshop on Embedded Computer Systems | 2001

A Methodology to Design Programmable Embedded Systems

Bart Kienhuis; Ed F. Deprettere; Pieter van der Wolf; Kees A. Vissers

Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450) | 1999

An MPEG-2 decoder case study as a driver for a system level design methodology

Pieter van der Wolf; Paul Lieverse; Mudit Goel; David La Hei; Kees A. Vissers

We present a case study on the design of a heterogeneous architecture for MPEG-2 video decoding. The primary objective of the case study is the validation of the SPADE methodology for architecture exploration. The case study demonstrates that this methodology provides a structured approach to the efficient evaluation of the performance of candidate architectures for selected benchmark applications. We learned that the MPEG-2 decoder can conveniently be modeled as a Kahn process network using a simple API. Abstract models of architectures can be constructed efficiently using a library of generic building blocks. A trace driven simulation technique enables the use of these abstract models for performance analysis with correct handling of data dependent behavior. We performed a design space exploration to derive how the performance of the decoder depends on the busload and the frame rate.

field programmable logic and applications | 2002

Field-Programmable Custom Computing Machines - A Taxonomy -

Mihai Sima; Stamatis Vassiliadis; Sorin Cotofana; Jos van Eijndhoven; Kees A. Vissers

The ability for providing a hardware platformwhich can be customized on a per-application basis under software control has established Reconfigurable Computing (RC) as a new computing paradigm. A machine employing the RC paradigm is referred to as a Field-Programmable Custom Computing Machine (FCCM). So far, the FCCMs have been classified according to implementation criteria. For the previous classifications do not reveal the entire meaning of the RC paradigm, we propose to classify the FCCMs according to architectural criteria. To analyze the phenomena inside FCCMs, we introduce a formalism based on microcode, in which any custom operation performed by a field-programmed computing facility is executed as a microprogram with two basic stages: SET CONFIGURATION and EXECUTE CUSTOM OPERATION. Based on the SET/EXECUTE formalism, we then propose an architectural-based taxonomy of FCCMs.

field-programmable logic and applications | 2013

A flexible hash table design for 10GBPS key-value stores on FPGAS

Zsolt István; Gustavo Alonso; Michaela Blott; Kees A. Vissers

Common web infrastructure relies on distributed main memory key-value stores to reduce access load on databases, thereby improving both performance and scalability of web sites. As standard cloud servers provide sub-linear scalability and reduced power efficiency to these kinds of scale-out workloads, we have investigated a novel dataflow architecture for key-value stores with the aid of FPGAs which can deliver consistent 10Gbps throughput. In this paper, we present the design of a novel hash table which forms the centre piece of this dataflow architecture. The fully pipelined design can sustain consistent 10Gbps line-rate performance by deploying a concurrent mechanism to handle hash collisions. We address problems such as support for a broad range of key sizes without stalling the pipeline through careful matching of lookup time with packet reception time. Finally, the design is based on a scalable architecture that can be easily parametrized to work with different memory types operating at different access speeds and latencies. We deployed this hash table in a memcached prototype to index 2 million entries in 24GBytes of external DDR3 DRAM while sustaining 13 million requests per second for UDP binary encoded memcached packets which is the maximum packet rate that can be achieved with memcached on a 10Gbps link.

Explore More