Jan van Lunteren | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jan van Lunteren is active.

Explore More

Publication

Featured researches published by Jan van Lunteren.

design, automation, and test in europe | 2015

Memristor based computation-in-memory architecture for data-intensive applications

Said Hamdioui; Lei Xie; Hoang Anh Du Nguyen; Mottaqiallah Taouil; Koen Bertels; Henk Corporaal; Hailong Jiao; Francky Catthoor; Dirk Wouters; Linn Eike; Jan van Lunteren

One of the most critical challenges for todays and future data-intensive and big-data problems is data storage and analysis. This paper first highlights some challenges of the new born Big Data paradigm and shows that the increase of the data size has already surpassed the capabilities of todays computation architectures suffering from the limited bandwidth, programmability overhead, energy inefficiency, and limited scalability. Thereafter, the paper introduces a new memristor-based architecture for data-intensive applications. The potential of such an architecture in solving data-intensive problems is illustrated by showing its capability to increase the computation efficiency, solving the communication bottleneck, reducing the leakage currents, etc. Finally, the paper discusses why memristor technology is very suitable for the realization of such an architecture; using memristors to implement dual functions (storage and logic) is illustrated.

international symposium on microarchitecture | 2012

Designing a Programmable Wire-Speed Regular-Expression Matching Accelerator

Jan van Lunteren; Christoph Hagleitner; Timothy Heil; Giora Biran; Uzi Shvadron; Kubilay Atasu

A growing number of applications rely on fast pattern matching to scan data in real-time for security and analytics purposes. The RegX accelerator in the IBM Power Edge of Network (PowerEN) processor supports these applications using a combination of fast programmable state machines and simple processing units to scan data streams against thousands of regular-expression patterns at state-of-the-art Ethernet link speeds. RegX employs a special rule cache and includes several new micro-architectural features that enable various instruction dispatch and execution options for the processing units. The architecture applies RISC philosophy to special-purpose computing: hardware provides fast, simple primitives, typically performed in a single cycle, which are exploited by an intelligent compiler and system software for high performance. This approach provides the flexibility required to achieve good performance across a wide range of workloads. As implemented in the PowerEN processor, the accelerator achieves a theoretical peak scan rate of 73.6 Gbit/s, and a measured scan rate of about 15 to 40 Gbit/s for typical intrusion detection workloads.

international conference on computer communications | 2012

Hardware-accelerated regular expression matching at multiple tens of Gb/s

Jan van Lunteren; Alexis Guanella

Hardware acceleration of regular expression matching is key to meeting the throughput requirements of state-of-the-art network intrusion detection systems (NIDSs) dictated by fast growing link speeds. This paper presents extensions to a programmable state machine, called B-FSM, which was initially optimized for string matching. These extensions enable direct support in hardware for essential regular expression features, such as character classes and case insensitivity. Moreover, they also allow the exploitation of regular expression properties that show up at the data structure level as common transitions shared between multiple states, resulting in storage reductions of up to 95% for five NIDS pattern sets analyzed. Additional instruction support based on a flexible integration within the B-FSM data structure increases the processing capabilities and enables the scaling to larger pattern collections. The new IBM Power Edge of NetworkTM processor employs the B-FSM technology to provide scanning capabilities at typical rates of 20-40 Gb/s.

international conference on hardware/software codesign and system synthesis | 2009

Memory-efficient distribution of regular expressions for fast deep packet inspection

Jonathan Rohrer; Kubilay Atasu; Jan van Lunteren; Christoph Hagleitner

Current trends in network security force network intrusion detection systems (NIDS) to scan network traffic at wirespeed beyond 10 Gbps against increasingly complex patterns, often specified using regular expressions. As a result, dedicated regular-expression accelerators have recently received considerable attention. The storage efficiency of the compiled patterns is a key factor in the overall performance and critically depends on the distribution of the patterns to a limited number of parallel pattern-matching engines. In this work, we first present a formal definition and complexity analysis of the pattern distribution problem and then introduce optimal and heuristic methods to solve it. Our experiments with five sets of regular expressions from both public and proprietary NIDS result in an up to 8.8x better storage efficiency than the state of the art. The average improvement is 2.3x.

international parallel and distributed processing symposium | 2013

Hardware-Accelerated Regular Expression Matching with Overlap Handling on IBM PowerEN Processor

Kubilay Atasu; Florian Doerfler; Jan van Lunteren; Christoph Hagleitner

Programmable hardware accelerators for regular expression (regex) matching are evolving into increasingly complex stream processors, which involve multiple state machines that operate in parallel, and specialized post-processors that can process instructions dispatched by the state machines. To improve the speed and the storage-efficiency, complex regexs are decomposed into simpler subexpressions, where each subexpression can fire one or more instructions. Although the impact of regex decompositions on the storage efficiency is well-known, little has been done to address the correctness and completeness. We show that regex decompositions can result in false positives if overlaps between subexpressions are not taken into account. We describe formal methods to recognize various types of subexpression overlaps that can arise in regex decompositions. We also describe efficient post-processing techniques to eliminate the associated false positives. To enable efficient mapping of the decomposed regexs to the postprocessors, we propose integer programming based register allocation methods. Our methods pack narrow variables to reduce the register and instruction usage, and take advantage of multi-register reset instructions to reduce the number of instructions that must be executed in parallel. Experiments on regex sets obtained from open-source and proprietary network intrusion detection systems demonstrate orders of magnitude improvement in the storage efficiency over state-of-the-art.

computing frontiers | 2015

An energy-efficient custom architecture for the SKA1-low central signal processor

Leandro Fiorin; Erik Vermij; Jan van Lunteren; Rik Jongerius; Christoph Hagleitner

The Square Kilometre Array (SKA) will be the biggest radio telescope ever built, with unprecedented sensitivity, angular resolution, and survey speed. This paper explores the design of a custom architecture for the central signal processor (CSP) of the SKA1-Low, the SKAs aperture-array instrument consisting of 131,072 antennas. The SKA1-Lows antennas receive signals between 50 and 350 MHz. After digitization and preliminary processing, samples are moved to the CSP for further processing. In this work, we describe the challenges in building the CSP, and present a first quantitative study for the implementation of a custom hardware architecture for processing the main CSP algorithms. By taking advantage of emerging 3D-stacked-memory devices and by exploring the design space for a 14-nm implementation, we estimate a power consumption of 14.4 W for processing all channels of a sub-band and an energy efficiency at application level of up to 208 GFLOPS/W for our architecture.

international conference on acoustics, speech, and signal processing | 2014

Scalable, efficient ASICS for the square kilometre array: From A/D conversion to central correlation

Martin L. Schmatz; Rik Jongerius; Gero Dittmann; Andreea Anghel; Ton Engbersen; Jan van Lunteren; Peter Buchmann

The Square Kilometre Array (SKA) is a future radio telescope, currently being designed by the worldwide radio-astronomy community. During the first of two construction phases, more than 250,000 antennas will be deployed, clustered in aperture-array stations. The antennas will generate 2.5 Pb/s of data, which needs to be processed in real time. For the processing stages from A/D conversion to central correlation, we propose an ASIC solution using only three chip architectures. The architecture is scalable - additional chips support additional antennas or beams - and versatile - it can relocate its receiver band within a range of a few MHz up to 4GHz. This flexibility makes it applicable to both SKA phases 1 and 2. The proposed chips implement an antenna and station processor for 289 antennas with a power consumption on the order of 600W and a correlator, including corner turn, for 911 stations on the order of 90 kW.

computing frontiers | 2016

An architecture for near-data processing systems

Erik Vermij; Christoph Hagleitner; Leandro Fiorin; Rik Jongerius; Jan van Lunteren; Koen Bertels

Near-data processing is a promising paradigm to address the bandwidth, latency, and energy limitations in todays computer systems. In this work, we introduce an architecture that enhances a contemporary multi-core CPU with new features for supporting a seamless integration of near-data processing capabilities. Crucial aspects such as coherency, data placement, communication, address translation, and the programming model are discussed. The essential components, as well as a system simulator, are realized in hardware and software. Results for the important Graph500 benchmark show a 1.5x speedup when using the proposed architecture.

ieee hot chips symposium | 2006

A novel processor architecture for high-performance stream processing

Jan van Lunteren

This article consists of a collection of slides from the authors conference presentation on a novel processor architecture for high performance stream processing. Some of the specific topics discussed include: an introduction to the technology and applications supported; high-level concept design; programmable state machine; novel processor technologies; instructure cache ad prefetch; and experimental results for testing the performance output.

software and compilers for embedded systems | 2016

Scalable DFA Compilation for High-Performance Regular-Expression Matching

Jan van Lunteren

Regular-expression accelerators often rely on sophisticated compilers to fully exploit the available hardware capabilities for achieving wire-speed scan rates of multiple tens of gigabits per second. This paper presents a method for the efficient compilation of pattern-matching functions specified by deterministic finite automata (DFAs) into executable structures targeted at accelerators based on B-FSM programmable state machines. The compilation scheme presented is able to effectively exploit an adaptive compression mechanism to obtain one of the most compact state-transition-table structures in the industry, in combination with fast compilation times. The heuristic-based approach scales to very large DFAs having tens of millions of transitions, while achieving an approximately linear growth of the storage needs as a function of the DFA size.

Explore More