Rik Jongerius | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rik Jongerius is active.

Explore More

Publication

Featured researches published by Rik Jongerius.

international conference on computer design | 2015

Analytic processor model for fast design-space exploration

Rik Jongerius; Giovanni Mariani; Andreea Anghel; Gero Dittmann; Erik Vermij; Henk Corporaal

In this paper, we propose an analytic model that takes as inputs a) a parametric microarchitecture-independent characterization of the target workload, and b) a hardware configuration of the core and the memory hierarchy, and returns as output an estimation of processor-core performance. To validate our technique, we compare our performance estimates with measurements on an Intel® Xeon® system. The average error increases from 21% for a state-of-the-art simulator to 25% for our model, but we achieve a speedup of several orders of magnitude. Thus, the model enables fast designspace exploration and represents a first step towards an analytic exascale system model.

ieee international conference on high performance computing data and analytics | 2015

Challenges in exascale radio astronomy: Can the SKA ride the technology wave?

Erik Vermij; Leandro Fiorin; Rik Jongerius; Christoph Hagleitner; Koen Bertels

The Square Kilometre Array (SKA) will be the most sensitive radio telescope in the world. This unprecedented sensitivity will be achieved by combining and analyzing signals from 262,144 antennas and 350 dishes at a raw datarate of petabits per second. The processing pipeline to create useful astronomical data will require hundreds of peta-operations per second, at a very limited power budget. We analyze the compute, memory and bandwidth requirements for the key algorithms used in the SKA. By studying their implementation on existing platforms, we show that most algorithms have properties that map inefficiently on current hardware, such as a low compute–bandwidth ratio and complex arithmetic. In addition, we estimate the power breakdown on CPUs and GPUs, analyze the cache behavior on CPUs, and discuss possible improvements. This work is complemented with an analysis of supercomputer trends, which demonstrates that current efforts to use commercial off-the-shelf accelerators results in a two to three times smaller improvement in compute capabilities and power efficiency than custom built machines. We conclude that waiting for new technology to arrive will not give us the instruments currently planned in 2018: one or two orders of magnitude better power efficiency and compute capabilities are required. Novel hardware and system architectures, to match the needs and features of this unique project, must be developed.

computing frontiers | 2015

An energy-efficient custom architecture for the SKA1-low central signal processor

Leandro Fiorin; Erik Vermij; Jan van Lunteren; Rik Jongerius; Christoph Hagleitner

The Square Kilometre Array (SKA) will be the biggest radio telescope ever built, with unprecedented sensitivity, angular resolution, and survey speed. This paper explores the design of a custom architecture for the central signal processor (CSP) of the SKA1-Low, the SKAs aperture-array instrument consisting of 131,072 antennas. The SKA1-Lows antennas receive signals between 50 and 350 MHz. After digitization and preliminary processing, samples are moved to the CSP for further processing. In this work, we describe the challenges in building the CSP, and present a first quantitative study for the implementation of a custom hardware architecture for processing the main CSP algorithms. By taking advantage of emerging 3D-stacked-memory devices and by exploring the design space for a 14-nm implementation, we estimate a power consumption of 14.4 W for processing all channels of a sub-band and an energy efficiency at application level of up to 208 GFLOPS/W for our architecture.

international conference on acoustics, speech, and signal processing | 2014

Scalable, efficient ASICS for the square kilometre array: From A/D conversion to central correlation

Martin L. Schmatz; Rik Jongerius; Gero Dittmann; Andreea Anghel; Ton Engbersen; Jan van Lunteren; Peter Buchmann

The Square Kilometre Array (SKA) is a future radio telescope, currently being designed by the worldwide radio-astronomy community. During the first of two construction phases, more than 250,000 antennas will be deployed, clustered in aperture-array stations. The antennas will generate 2.5 Pb/s of data, which needs to be processed in real time. For the processing stages from A/D conversion to central correlation, we propose an ASIC solution using only three chip architectures. The architecture is scalable - additional chips support additional antennas or beams - and versatile - it can relocate its receiver band within a range of a few MHz up to 4GHz. This flexibility makes it applicable to both SKA phases 1 and 2. The proposed chips implement an antenna and station processor for 289 antennas with a power consumption on the order of 600W and a correlator, including corner turn, for 911 stations on the order of 90 kW.

International Journal of Parallel Programming | 2016

An Instrumentation Approach for Hardware-Agnostic Software Characterization

Andreea Anghel; Laura Mihaela Vasilescu; Giovanni Mariani; Rik Jongerius; Gero Dittmann

Simulators and empirical profiling data are often used to understand how suitable a specific hardware architecture is for an application. However, simulators can be slow, and empirical profiling-based methods can only provide insights about the existing hardware on which the applications are executed. While the insights obtained in this way are valuable, such methods cannot be used to evaluate a large number of system designs efficiently. Analytical performance evaluation models are fast alternatives, particularly well-suited for system design-space exploration. However, to be truly application-specific, they need to be combined with a workload model that captures relevant application characteristics. In this paper we introduce PISA, a framework based on the LLVM infrastructure that is able to generate such a model for sequential and parallel applications by performing hardware-independent characterization. Characteristics such as instruction-level parallelism, memory access patterns and branch behavior are analyzed per thread or process during application execution. To illustrate the potential of the framework, we provide a detailed characterization of a representative benchmark for graph-based analytics, Graph 500. Finally, we analyze how the properties extracted with PISA across Graph 500 and SPEC CPU2006 applications compare to measurements performed on x86 and POWER8 processors.

africon | 2013

Computing cost of sensitivity and survey speed for aperture array and phased array feed systems

Stefan J. Wijnholds; Rik Jongerius

Aperture array (AA) and phased array feed (PAF) systems are envisaged to play an important role in the Square Kilometre Array (SKA), because their multi-beaming capability can be used to improve survey speed and enhance observing flexibility. In this paper, we demonstrate that computing costs and I/O rates should be considered as an integral part of the overall system design. We do so by comparing the correlator and imaging computing requirements for the current SKA phase 1 (SKA1) AA-low baseline design with those for an alternative design with the same survey speed and sensitivity. We also compare the correlator and imaging computing demands and survey speed for the proposed SKA1 survey array (dishes with PAFs), the envisaged SKA1 dish array (dishes with single pixel feeds (SPFs)) and an AA-mid alternative design for the 300-1000 MHz range. We conclude that the current SKA1 baseline design may not be the optimal solution in view of computing requirements (hence operating costs) for given sensitivity and survey speed.

ieee acm international symposium cluster cloud and grid computing | 2017

Predicting Cloud Performance for HPC Applications: a User-oriented Approach

Giovanni Mariani; Andreea Anghel; Rik Jongerius; Gero Dittmann

Cloud computing enables end users to execute high-performance computing applications by renting the required computing power. This pay-for-use approach enables small enterprises and startups to run HPC-related businesses with a significant saving in capital investment and a short time to market. When deploying an application in the cloud, the users may a) fail to understand the interactions of the application with the software layers implementing the cloud system, b) be unaware of some hardware details of the cloud system, and c) fail to understand how sharing part of the cloud system with other users might degrade application performance. These misunderstandings may lead the users to select suboptimal cloud configurations in terms of cost or performance. To aid the users in selecting the optimal cloud configuration for their applications, we suggest that the cloud provider generate a prediction model for the provided system. We propose applying machine-learning techniques to generate this prediction model. First, the cloud provider profiles a set of training applications by means of a hardware-independent profiler and then executes these applications on a set of training cloud configurations to collect actual performance values. The prediction model is trained to learn the dependencies of actual performance data on the application profile and cloud configuration parameters. The advantage of using a hardware-independent profiler is that the cloud users and the cloud provider can analyze applications on different machines and interface with the same prediction model. We validate the proposed methodology for a cloud system implemented with OpenStack. We apply the prediction model to the NAS parallel benchmarks. The resulting relative error is below 15% and the Pareto optimal cloud configurations finally found when maximizing application speed and minimizing execution cost on the prediction model are also at most 15% away from the actual optimal solutions.

International Journal of Parallel Programming | 2016

Scaling Properties of Parallel Applications to Exascale

Giovanni Mariani; Andreea Anghel; Rik Jongerius; Gero Dittmann

A detailed profile of exascale applications helps to understand the computation, communication and memory requirements for exascale systems and provides the insight necessary for fine-tuning the computing architecture. Obtaining such a profile is challenging as exascale systems will process unprecedented amounts of data. Profiling applications at the target scale would require the exascale machine itself. In this work we propose a methodology to extrapolate the exascale profile from experimental observations over datasets feasible for today’s machines. Extrapolation models are carefully selected by means of statistical techniques and a high-level complexity analysis is included in the selection process to speed up the learning phase and to improve the accuracy of the final model. We extrapolate run-time properties of the target applications including information about the instruction mix, memory access pattern, instruction-level parallelism, and communication requirements. Compared to state-of-the-art techniques, the proposed methodology reduces the prediction error by an order of magnitude on the instruction count and improves the accuracy by up to 1.3

computing frontiers | 2015