Jochem H. Rutgers | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jochem H. Rutgers is active.

Explore More

Publication

Featured researches published by Jochem H. Rutgers.

parallel computing | 2013

Fixed latency on-chip interconnect for hardware spiking neural network architectures

Sandeep Pande; Fearghal Morgan; Gerard Smit; Tom Bruintjes; Jochem H. Rutgers; Brian McGinley; Seamus Cawley; Jim Harkin; Liam McDaid

Information in a Spiking Neural Network (SNN) is encoded as the relative timing between spikes. Distortion in spike timings can impact the accuracy of SNN operation by modifying the precise firing time of neurons within the SNN. Maintaining the integrity of spike timings is crucial for reliable operation of SNN applications. A packet switched Network on Chip (NoC) infrastructure offers scalable connectivity for spike communication in hardware SNN architectures. However, shared resources in NoC architectures can result in unwanted variation in spike packet transfer latency. This packet latency jitter distorts the timing information conveyed on the synaptic connections in the SNN, resulting in unreliable application behaviour. This paper presents a SystemC simulation based analysis of the synaptic information distortion in NoC based hardware SNNs. The paper proposes a fixed spike transfer latency ring topology interconnect for spike communication between neural tiles, using a novel timestamped spike broadcast flow control scheme. The proposed architectural technique is evaluated using spike rates employed in previously reported mesh topology NoC based hardware SNN applications, which exhibited spike latency jitter over NoC paths. Results indicate that the proposed interconnect offers fixed spike transfer latency and eliminates the associated information distortion. The paper presents the micro-architecture of the proposed ring router. The FPGA validated ring interconnect architecture has been synthesised using 65nm low-power CMOS technology. Silicon area comparisons for various ring sizes are presented. Scalability of the proposed architecture has been addressed by employing a hierarchical NoC architecture.

digital systems design | 2010

An Approximate Maximum Common Subgraph Algorithm for Large Digital Circuits

Jochem H. Rutgers; Pascal T. Wolkotte; P.K.F. Holzenspies; Jan Kuper; Gerard Smit

This paper presents an approximate Maximum Common Sub graph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse, have many different labels, and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common sub graphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.

international conference on embedded computer systems architectures modeling and simulation | 2012

An efficient asymmetric distributed lock for embedded multiprocessor systems

Jochem H. Rutgers; Marco Jan Gerrit Bekooij; Gerardus Johannes Maria Smit

Efficient synchronization is a key concern in an embedded many-core system-on-chip (SoC). The use of atomic read-modify-write instructions combined with cache coherency as synchronization primitive is not always an option for shared-memory SoCs due to the lack of suitable IP. Furthermore, there are doubts about the scalability of hardware cache coherency protocols. Existing distributed locks for NUMA multiprocessor systems do not rely on cache coherency and are more scalable, but exchange many messages per lock. This paper introduces an asymmetric distributed lock algorithm for shared-memory embedded multiprocessor systems without hardware cache coherency. Messages are exchanged via a low-cost inter-processor communication ring in combination with a small local memory per processor. Typically, a mutex is used over and over again by the same process, which is exploited by our algorithm. As a result, the number of messages exchanged per lock is significantly reduced. Experiments with our 32-core system show that when having locks in SDRAM, 35% of the memory traffic is lock related. In comparison, our solution eliminates all of this traffic and reduces the execution time by up to 89%.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

Portable Memory Consistency for Software Managed Distributed Memory in Many-Core SoC

Jochem H. Rutgers; Marco Jan Gerrit Bekooij; Gerardus Johannes Maria Smit

Porting software to different platforms can require modifications of the application. One of the issues is that the targeted hardware supports another memory consistency model. As a consequence, the completion order of reads and writes in a multi-threaded application can change, which may result in improper synchronization. For example, a processor with out-of-order execution could break synchronization if proper fence instructions are missing. Such a bug can cause sporadic errors, which are hard to debug. This paper presents an approach that makes applications independent of the memory model of the hardware, hence they can be compiled to hardware with any memory architecture. The key is having a memory model that only guarantees the most fundamental orderings of reads and writes, and annotations to specify additional ordering constraints. As a result, tooling can transparently and properly implement fences, cache flushes, etc. when appropriate, without losing flexibility of the hardware design. In a case study, several SPLASH-2 applications are run on a 32-core software cache coherent Micro Blaze system in FPGA. Moreover, this approach also allows mapping to scratch-pad memories and a distributed shared memory architecture.

International Journal of Psychophysiology | 2012