Raphael Poss | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Raphael Poss is active.

Explore More

Publication

Featured researches published by Raphael Poss.

rapid simulation and performance evaluation methods and tools | 2012

Heterogeneous integration to simplify many-core architecture simulations

Raphael Poss; Mike Lankamp; M. Irfan Uddin; Jaroslav Sýkora; Leoš Kafka

The EU Apple-CORE project has explored the design and implementation of novel general-purpose many-core chips featuring hardware microthreading and hardware support for concurrency management. The introduction of the latter in the cores ISA has required simultaneous investigation into compilers and multiple layers of the software stack, including operating systems. The main challenge in such vertical approaches is the cost of implementing simultaneously a detailed simulation of new hardware components and a complete system platform suitable to run large software bench-maks. In this paper, we describe our use case and our solutions to this challenge.

rapid simulation and performance evaluation methods and tools | 2012

Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores

M. Irfan Uddin; Chris R. Jesshope; Michiel W. van Tol; Raphael Poss

The current many-core architectures are generally evaluated using cycle-accurate simulations. However these detailed simulations of the architecture make the evaluation of large programs very slow. Since the focus in many-core architecture is shifting from the performance of individual cores to the overall behavior of the chip, high-level simulations are becoming necessary, which evaluate the same architecture at less detailed level and allow the designer to make quick and reasonably accurate design decisions. We have developed a high-level simulator for the design space exploration of the Microgrid, which is a many-core architecture comprised of many fine-grained multi-threaded cores. This simulator allows us to investigate mapping and scheduling strategies of families (i.e. groups of threads) in developing an operating environment for the Microgrid. The previous method to count and evaluate the workload in basic blocks was not accurate enough. The key problem was that with many concurrent threads the latency of certain instructions is hidden because of the multi-threaded nature of the core. This paper presents a technique to determine the execution time of different types of instructions with thread concurrency. We believe to achieve high accuracy in evaluating programs in the high-level simulator.

international conference on simulation and modeling methodologies technologies and applications | 2014

Signature-based high-level simulation of microthreaded many-core architectures

M. Irfan Uddin; Raphael Poss; Chris R. Jesshope

The simulation of fine-grained latency tolerance based on the dynamic state of the system in high-level simulation of many-core systems is a challenging simulation problem. We have introduced a high-level simulation technique for microthreaded many-core systems based on the assumption that the throughput of the program can always be one cycle per instruction as these systems have fine-grained latency tolerance. However, this assumption is not always true if there are insufficient threads in the pipeline and hence long latency operations are not tolerated. In this paper we introduce Signatures to classify low-level instructions in high-level categories and estimate the performance of basic blocks during the simulation based on the concurrent threads in the pipeline. The simulation of fine-grained latency tolerance improves accuracy in the high-level simulation of many-core systems.

Journal of Systems Architecture | 2014

Cache-based high-level simulation of microthreaded many-core architectures

M. Irfan Uddin; Raphael Poss; Chris R. Jesshope

The accuracy of simulated cycles in high-level simulators is generally less than the accuracy in detailed simulators for a single-core systems, because high-level simulators simulate the behaviour of components rather than the components themselves as in detailed simulators. The simulation problem becomes more challenging when simulating many-core systems, where many cores are executing instructions concurrently. In these systems data may be accessed from multiple caches and the abstraction of the instruction execution has to consider the dynamic resource sharing on the whole chip. The problem becomes even more challenging in microthreaded many-core systems, because there may exist concurrent hardware threads. Which means that the latency of long latency operations can be tolerated from many cycles to just few cycles. We have previously presented a simulation technique to improve the accuracy in high-level simulation of microthreaded many-core systems, known as Signature-based high- level simulator, which adapts the throughput of the program based on the type of instructions, number of instructions and number of active threads in the pipeline. However, it disregards the access to different levels of the caches on the many-core system. Accessing L1-cache has far less latency than accessing off-chip memory and if the core is not able to tolerate latency, different levels of caches can not be treated equally. The distributed cache network along with the synchronization-aware coherency protocol in the Microgrid is a complicated memory architecture and it is difficult to simulate its behaviour at a high-level. In this article we present a high-level cache model, which aims to improve the accuracy in high-level simulators for general-purpose many-core systems by adding little complexity to the simulator and without affecting the simulation speed.

international conference on embedded computer systems architectures modeling and simulation | 2013

MGSim — A simulation environment for multi-core research and education

Raphael Poss; Mike Lankamp; Qiang Yang; Jian Fu; M. Irfan Uddin; Chris R. Jesshope

This article presents MGSim1, an open source discrete event simulator for on-chip hardware components developed at the University of Amsterdam. MGSim is used as research and teaching vehicle to study the fine-grained hardware/software interactions on many-core chips with and without hardware multithreading. MGSims component library includes support for core models with different instruction sets, a configurable interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a multi-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.

european conference on parallel processing | 2010

Resource-agnostic programming for many-core microgrids

Thomas A. M. Bernard; Clemens Grelck; Michael A. Hicks; Chris R. Jesshope; Raphael Poss

Many-core architectures are a commercial reality, but programming them efficiently is still a challenge, especially if the mix is heterogeneous. Here granularity must be addressed, i.e. when to make use of concurrency resources and when not to. We have designed a data-driven, fine-grained concurrent execution model (SVP) that captures concurrency in a resource-agnostic way. Our approach separates the concern of describing a concurrent computation from its mapping and scheduling. We have implemented this model as a novel many-core architecture programmed with a language called µTC. In this paper we demonstrate how we achieve our goal of resource-agnostic programming on this target, where heterogeneity is exposed as arbitrarily sized clusters of cores.

parallel computing | 2010

Making multi-cores mainstream - from security to scalability

Chris R. Jesshope; Michael A. Hicks; Mike Lankamp; Raphael Poss; Li Zhang

In this paper we will introduce work being supported by the EU in the Apple-CORE project (http://www.apple-core.info). This project is pushing the boundaries of programming and systems development in multi-core architectures in an attempt to make multi-core go mainstream, i.e. continuing the current trends in low-power, multi-core architecture to thousands of cores on chip and supporting this in the context of the next generations of PCs. This work supports dataflow principles but with a conventional programming style. The paper describes the underlying execution model, a core design based on this model and its emulation in software. We also consider system issues that impact security. The major benefits of this approach include asynchrony, i.e. the ability to tolerate long latency operations without impacting performance and binary compatibility. We present results that show very high efficiency and good scalability despite the high memory access latency in the proposed chip architecture.

implementation and application of functional languages | 2010

Concurrent non-deferred reference counting on the Microgrid: first experiences

Stephan Herhut; Carl Joslin; Sven-Bodo Scholz; Raphael Poss; Clemens Grelck

We present a first evaluation of our novel approach for nondeferred reference counting on the Microgrid many-core architecture. Non-deferred reference counting is a fundamental building block of implicit heap management of functional array languages in general and Single Assignment C in particular. Existing lock-free approaches for multi-core and SMP settings do not scale well for large numbers of cores in emerging many-core platforms. We, instead, employ a dedicated core for reference counting and use asynchronous messaging to emit reference counting operations. This novel approach decouples computational workload from reference-counting overhead. Experiments using cycle-accurate simulation of a realistic Microgrid show that, by exploiting asynchronism, we are able to tolerate even worst-case reference counting loads reasonably well. Scalability is essentially limited only by the combined sequential runtime of all reference counting operations, in accordance with Amdahls law. Even though developed in the context of Single Assignment C and the Microgrid, our approach is applicable to a wide range of languages and platforms.

ACM Transactions in Embedded Computing Systems | 2014

On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches

Qiang Yang; Jian Fu; Raphael Poss; Chris R. Jesshope

When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multithreaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions.

parallel, distributed and network-based processing | 2014

Analytical-Based High-Level Simulation of the Microthreaded Many-Core Architectures

M. Irfan Uddin; Raphael Poss; Chris R. Jesshope

High-level simulation is becoming commonly used for design space exploration of many-core systems. We have been working on high-level simulation techniques for the microthreaded many-core architecture at the University of Amsterdam. In previous work different levels of high-level simulation for instruction execution have been proposed, where the objective of every level is to keep the highest possible abstraction in order to achieve the least complexity and highest simulation speed with a compromise on the amount of accuracy. In this article we propose a new breakthrough in abstraction by simulating entire components in applications using analytical models. This simulation technique greatly reduces the complexity of the simulator and increases the simulation speed by orders of magnitude compared to the other levels of the high-level simulator, without affecting the simulation accuracy.

Explore More