Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pablo Reble is active.

Publication


Featured researches published by Pablo Reble.


international conference on high performance computing and simulation | 2011

Evaluation and improvements of programming models for the Intel SCC many-core processor

Carsten Clauss; Stefan Lankes; Pablo Reble; Thomas Bemmerl

Since the beginning of the multicore era, parallel processing has become prevalent across the board. On a traditional multicore system, a single operating system manages all cores and schedules threads and processes among them, inherently supported by hardware-implemented cache coherence protocols. However, a further growth of the number of cores per system implies an increasing chip complexity, especially with respect to the cache coherence protocols. Therefore, a very attractive alternative for future many-core systems is to waive the hardware-based cache coherency and to introduce a software-oriented message-passing based architecture instead: a so-called Cluster-on-Chip architecture. Intels Single-chip Cloud Computer (SCC), a many-core research processor with 48 non-coherent memory-coupled cores, is a very recent example for such a Cluster-on-Chip architecture. The SCC can be configured to run one operating system instance per core by partitioning the shared main memory in a strict manner. However, it is also possible to access the shared main memory in an unsplit and concurrent manner, provided that the cache coherency is then ensured by software. In this paper, we detail our first experiences gained while developing low-level software for message-passing and shared-memory programming on the SCC. In doing so, we evaluate the potential of both programming models and we show how these models can be improved especially with respect to the SCCs many-core architecture.


programming models and applications for multicores and manycores | 2012

Revisiting shared virtual memory systems for non-coherent memory-coupled cores

Stefan Lankes; Pablo Reble; Oliver Sinnen; Carsten Clauss

The growing number of cores per chip implies an increasing chip complexity, especially with respect to hardware-implemented cache coherence protocols. An attractive alternative for future many-core systems is to waive the hardware-based cache coherency and to introduce a software-oriented approach instead: a so-called Cluster-on-Chip architecture. The Single-chip Cloud Computer (SCC) is a recent research processor of such architectures. This paper presents an approach to deal with the missing cache coherence protocol by using a software managed cache coherence system, which is based on the well-established concept of a shared virtual memory (SVM) management system. Through SCCs unique features like a new memory type, which is directly integrated on the processor die, new and capable options exist to realize an SVM system. The convincing performance results presented in this paper show that nearly forgotten concepts will become attractive again for future many-core systems.


Concurrency and Computation: Practice and Experience | 2015

New system software for parallel programming models on the Intel SCC many‐core processor

Carsten Clauss; Stefan Lankes; Pablo Reble; Thomas Bemmerl

Since the beginning of the multicore era, parallel processing has become prevalent across the board. On a traditional multicore system, a single operating system manages all cores and schedules threads and processes among them, inherently supported by hardware‐implemented cache coherence protocols. However, a further growth of the number of cores per system implies an increasing chip complexity, especially with respect to the cache coherence protocols. Therefore, a very attractive alternative for future many‐core systems is to waive the hardware‐based cache coherency and to introduce a software‐oriented message‐passing based architecture instead: a so‐called Cluster‐on‐Chip architecture. Intels Single‐chip Cloud Computer (SCC), a many‐core research processor with 48 non‐coherent memory‐coupled cores, is a very recent example for such a cluster‐on‐chip architecture. The SCC can be configured to run one operating system instance per core by partitioning the shared main memory in a strict manner. However, it is also possible to access the shared main memory in an unsplit and concurrent manner, provided that either the caches are disabled or the cache coherency is then ensured by software. In this article, we detail our experiences gained while developing low‐level software for message‐passing and shared‐memory programming on the SCC. We present an SCC‐customized MPI library (called SCC‐MPICH) as well as a shared virtual memory system (called MetalSVM) for the SCC. In doing so, we evaluate the potential of both programming models and we show how these models can be improved especially with respect to the SCCs many‐core architecture. Copyright


international conference on high performance computing and simulation | 2013

One-sided communication and synchronization for non-coherent memory-coupled cores

Pablo Reble; Carsten Clauss; Stefan Lankes

The trend towards the integration of many cores per chip will raise the demand for new many-core architectures if established multi-core techniques such as hardware implemented cache-coherence limit scalability. Parallel applications especially with dynamically changing access pattern can benefit from software supported weaker memory consistency models in combination with one-sided communication. The realization of one-sided communication through direct access of non-coherent shared memory requires flush operations and synchronization points. In this paper, we analyze the effect of low-overhead hardware-supported synchronization methods for shared memory windows with different memory models as described by the one-sided communication extensions of MPI-3. Moreover, we compare this approach with common message passing based implementation.


european conference on parallel processing | 2013

Towards Predictability of Operating System Supported Communication for PCIe Based Clusters

Pablo Reble; Georg Wassen

In unconventional clusters, the operating system can provide helper tasks to work around missing hardware features or to improve the performance. We present in this paper a concept of operating system supported data transfer via remote memory access through PCI Express. The performance of such many-core clusters is very sensitive to interferences. We analyze the predictability of on-chip and inter-device data transfers. Using the example of two tightly coupled Intel SCCs, we evaluate our approach with methods of real-time research.


Computer Science - Research and Development | 2016

Editorial for the special issue on Energy-aware high performance computing

Pablo Reble; Thomas Ludwig; Matthias S. Müller; Wolfgang E. Nagel; Vincent Heuveline

In 2010 we started the EnA-HPC conference series in order to bring researchers, vendors, and HPC center administrators together. Its purpose is to foster discussions regarding the status and future of energy awareness in high performance computing. Fields of interest cover all layers, from the lowest level of hardware technology, via operating system, compiler and application issues to facility technologies like air conditioning, sensor technology and heat reuse. After five successful conferences—2010 to 2012 in Hamburg and 2013/2014 in Dresden—EnA-HPC is taking a break in 2015. This special issue includes selected articles that have been submitted since the last event. According to the U.S. Department of Energy 20 MW is a practical power limit for future Exaflop computers. A comprehensive effort atmany levels is necessary to yield the overall energy reduction required to enable Exaflop computers that stay within this limit. Today’s fastest supercomputer— Tianhe-2 (MilkyWay-2) installed at the National Super


programming models and applications for multicores and manycores | 2015

Effective communication for a system of cluster-on-a-chip processors

Pablo Reble; Stefan Lankes; Fabian Fischer; Matthias S. Müller

In this work, we analyze efficient communication methods for a grid of many-core processors in the absence of cache coherence. For this study, we build a multi-chip processor with 240 tightly connected cores and demonstrate its scalability. This processor is based on the Intel SCC, a cluster-on-a-chip research processor with 48 non-coherent memory coupled cores. Our new research system virtually extends the on-chip network of multiple SCC systems and provides new communication functionality for direct on-chip memory access. We analyze access patterns of different communication schemes and apply techniques to hide latency, such as offloading communication and software caching with relaxed consistency.


Archive | 2012

The path to MetalSVM: Shared virtual memory for the SCC

Stefan Lankes; Pablo Reble; Carsten Clauss; Oliver Sinnen


MARC Symposium | 2011

A Fast Inter-Kernel Communication and Synchronization layer for MetalSVM.

Pablo Reble; Stefan Lankes; Carsten Clauss; Thomas Bemmerl


MARC@RWTH | 2012

Connecting the Cloud: Transparent and Flexible Communication for a Cluster of Intel SCCs.

Pablo Reble; Carsten Clauss; Michael Riepen; Stefan Lankes; Thomas Bemmerl

Collaboration


Dive into the Pablo Reble's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bronis R. de Supinski

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge