Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gero Dittmann is active.

Publication


Featured researches published by Gero Dittmann.


asia and south pacific design automation conference | 2008

Exploring power management in multi-core systems

Reinaldo A. Bergamaschi; Guoling Han; Alper Buyuktosunoglu; Hiren D. Patel; Indira Nair; Gero Dittmann; Geert Janssen; Nagu R. Dhanwada; Zhigang Hu; Pradip Bose; John A. Darringer

Power dissipation has become a critical design metric in microprocessor-based system design. In a multi-core system, running multiple applications, power and performance can be dynamically traded off using an integrated power management (PM) unit. This PM unit monitors the performance and power of each core and dynamically adjusts the individual voltages and frequencies in order to maximize system performance under a given power budget (usually set by the operating system). This paper presents a performance and power analysis methodology, featuring a simulation model for multi-core systems that can be easily reconfigured for different scenarios and a PM infrastructure for the exploration and analysis of PM algorithms. Two algorithms have been implemented: one for discrete and one for continuous power modes based on non-linear programming. Extensive experiments are reported, illustrating the effect of power management both at the core and the chip level.


IEEE ACM Transactions on Networking | 2005

Robust header compression (ROHC) in next-generation network processors

David E. Taylor; Andreas Herkersdorf; Andreas C. Döring; Gero Dittmann

Robust Header Compression (ROHC) provides for more efficient use of radio links for wireless communication in a packet switched network. Due to its potential advantages in the wireless access area and the proliferation of network processors in access infrastructure, there exists a need to understand the resource requirements and architectural implications of implementing ROHC in this environment. We present an analysis of the primary functional blocks of ROHC and extract the architectural implications on next-generation network processor design for wireless access. The discussion focuses on memory space and bandwidth dimensioning as well as processing resource budgets. We conclude with an examination of resource consumption and potential performance gains achievable by offloading computationally intensive ROHC functions to application specific hardware assists. We explore the design tradeoffs for hardware assists in the form of reconfigurable hardware, Application-Specific Instruction-set Processors (ASIPs), and Application-Specific Integrated Circuits (ASICs).


international conference on hardware/software codesign and system synthesis | 2007

Performance modeling for early analysis of multi-core systems

Reinaldo A. Bergamaschi; Indira Nair; Gero Dittmann; Hiren D. Patel; Geert Janssen; Nagu R. Dhanwada; Alper Buyuktosunoglu; Emrah Acar; Gi-Joon Nam; Dorothy Kucar; Pradip Bose; John A. Darringer; Guoling Han

Performance analysis of microprocessors is a critical step in defining the microarchitecture, prior to register-transfer-level (RTL) design. In complex chip multiprocessor systems, including multiple cores, caches and busses, this problem is compounded by complex performance interactions between cores, caches and interconnections, as well as by tight interdependencies between performance, power and physical characteristics of the design (i.e., floorplan). Although there are many point tools for the analysis of performance, or power, or floorplan of complex systems-on-chip (SoCs), there are surprisingly few works on an integrated tool that is capable of analyzing these various system characteristics simultaneously and allow the user to explore different design configurations and their effect on performance, power, size and thermal aspects. This paper describes an integrated tool for early analysis of performance, power, physical and thermal characteristics of multi-core systems. It includes cycle-accurate, transaction-level SystemC-based performance models of POWER processors and system components (i.e., caches, buses). Power models, for power computation, physical models for floorplanning and packaging models for thermal analysis are also included. The tool allows the user to build different systems by selecting components from a library and connecting them together in a visual environment. Using these models, users can simulate and dynamically analyze the performance, power and thermal aspects of multi-core systems.


international conference on computer design | 2015

Analytic processor model for fast design-space exploration

Rik Jongerius; Giovanni Mariani; Andreea Anghel; Gero Dittmann; Erik Vermij; Henk Corporaal

In this paper, we propose an analytic model that takes as inputs a) a parametric microarchitecture-independent characterization of the target workload, and b) a hardware configuration of the core and the memory hierarchy, and returns as output an estimation of processor-core performance. To validate our technique, we compare our performance estimates with measurements on an Intel® Xeon® system. The average error increases from 21% for a state-of-the-art simulator to 25% for our model, but we achieve a speedup of several orders of magnitude. Thus, the model enables fast designspace exploration and represents a first step towards an analytic exascale system model.


ieee international conference on high performance computing, data, and analytics | 2015

Quantifying Communication in Graph Analytics

Andreea Anghel; German Rodriguez; Cyriel Minkenberg; Gero Dittmann

Data analytics require complex processing, often taking the shape of parallel graph-based workloads. In ensuring a high level of efficiency for these applications, understanding where the bottlenecks lie is key, particularly understanding to which extent their performance is computation or communication-bound. In this work, we analyze a reference workload in graph-based analytics, the Graph 500 benchmark. We conduct a wide array of tests on a high-performance computing system, the MareNostrum III supercomputer, using a custom high-precision profiling methodology. We show that the application performance is communication-bound, with up to 80 % of the execution time being spent enabling communication. We equally show that, with the increase in scale and concurrency that is expected in future big data systems and applications, the importance of communication increases. Finally, we characterize this representative data-analytics workload and show that the dominating data exchange is uniform all-to-all communication, opening avenues for workload and network optimization.


international conference on acoustics, speech, and signal processing | 2014

Scalable, efficient ASICS for the square kilometre array: From A/D conversion to central correlation

Martin L. Schmatz; Rik Jongerius; Gero Dittmann; Andreea Anghel; Ton Engbersen; Jan van Lunteren; Peter Buchmann

The Square Kilometre Array (SKA) is a future radio telescope, currently being designed by the worldwide radio-astronomy community. During the first of two construction phases, more than 250,000 antennas will be deployed, clustered in aperture-array stations. The antennas will generate 2.5 Pb/s of data, which needs to be processed in real time. For the processing stages from A/D conversion to central correlation, we propose an ASIC solution using only three chip architectures. The architecture is scalable - additional chips support additional antennas or beams - and versatile - it can relocate its receiver band within a range of a few MHz up to 4GHz. This flexibility makes it applicable to both SKA phases 1 and 2. The proposed chips implement an antenna and station processor for 289 antennas with a power consumption on the order of 600W and a correlator, including corner turn, for 911 stations on the order of 90 kW.


european conference on service oriented and cloud computing | 2012

Simplified authentication and authorization for RESTful services in trusted environments

Eric Brachmann; Gero Dittmann; Klaus-Dieter Schubert

In some trusted environments, such as an organizations intranet, local web services may be assumed to be trustworthy. This property can be exploited to simplify authentication and authorization protocols between resource providers and consumers, lowering the threshold for developing services and clients. Existing security solutions for RESTful services, in contrast, support untrusted services, a complexity-increasing capability that is not needed on an intranet with only trusted services. We propose a central security service with a lean API that handles both authentication and authorization for trusted RESTful services. A user trades credentials for a token that facilitates access to services. The services may query the security service for token authenticity and roles granted to a user. The system provides fine-grained access control at the level of resources, following the role-based access control (RBAC) model. Resources are identified by their URLs, making the authorization system generic. The mapping of roles to users resides with the central security service and depends on the resource to be accessed. The mapping of permissions to roles is implemented individually by the services. We rely on secure channels and the trusted intermediaries characteristic for intranets to simplify the protocols involved and to make the security features easy to use, cutting the number of required API calls in half.


International Journal of Parallel Programming | 2016

An Instrumentation Approach for Hardware-Agnostic Software Characterization

Andreea Anghel; Laura Mihaela Vasilescu; Giovanni Mariani; Rik Jongerius; Gero Dittmann

Simulators and empirical profiling data are often used to understand how suitable a specific hardware architecture is for an application. However, simulators can be slow, and empirical profiling-based methods can only provide insights about the existing hardware on which the applications are executed. While the insights obtained in this way are valuable, such methods cannot be used to evaluate a large number of system designs efficiently. Analytical performance evaluation models are fast alternatives, particularly well-suited for system design-space exploration. However, to be truly application-specific, they need to be combined with a workload model that captures relevant application characteristics. In this paper we introduce PISA, a framework based on the LLVM infrastructure that is able to generate such a model for sequential and parallel applications by performing hardware-independent characterization. Characteristics such as instruction-level parallelism, memory access patterns and branch behavior are analyzed per thread or process during application execution. To illustrate the potential of the framework, we provide a detailed characterization of a representative benchmark for graph-based analytics, Graph 500. Finally, we analyze how the properties extracted with PISA across Graph 500 and SPEC CPU2006 applications compare to measurements performed on x86 and POWER8 processors.


ieee acm international symposium cluster cloud and grid computing | 2017

Predicting Cloud Performance for HPC Applications: a User-oriented Approach

Giovanni Mariani; Andreea Anghel; Rik Jongerius; Gero Dittmann

Cloud computing enables end users to execute high-performance computing applications by renting the required computing power. This pay-for-use approach enables small enterprises and startups to run HPC-related businesses with a significant saving in capital investment and a short time to market. When deploying an application in the cloud, the users may a) fail to understand the interactions of the application with the software layers implementing the cloud system, b) be unaware of some hardware details of the cloud system, and c) fail to understand how sharing part of the cloud system with other users might degrade application performance. These misunderstandings may lead the users to select suboptimal cloud configurations in terms of cost or performance. To aid the users in selecting the optimal cloud configuration for their applications, we suggest that the cloud provider generate a prediction model for the provided system. We propose applying machine-learning techniques to generate this prediction model. First, the cloud provider profiles a set of training applications by means of a hardware-independent profiler and then executes these applications on a set of training cloud configurations to collect actual performance values. The prediction model is trained to learn the dependencies of actual performance data on the application profile and cloud configuration parameters. The advantage of using a hardware-independent profiler is that the cloud users and the cloud provider can analyze applications on different machines and interface with the same prediction model. We validate the proposed methodology for a cloud system implemented with OpenStack. We apply the prediction model to the NAS parallel benchmarks. The resulting relative error is below 15% and the Pareto optimal cloud configurations finally found when maximizing application speed and minimizing execution cost on the prediction model are also at most 15% away from the actual optimal solutions.


International Journal of Parallel Programming | 2016

Scaling Properties of Parallel Applications to Exascale

Giovanni Mariani; Andreea Anghel; Rik Jongerius; Gero Dittmann

A detailed profile of exascale applications helps to understand the computation, communication and memory requirements for exascale systems and provides the insight necessary for fine-tuning the computing architecture. Obtaining such a profile is challenging as exascale systems will process unprecedented amounts of data. Profiling applications at the target scale would require the exascale machine itself. In this work we propose a methodology to extrapolate the exascale profile from experimental observations over datasets feasible for today’s machines. Extrapolation models are carefully selected by means of statistical techniques and a high-level complexity analysis is included in the selection process to speed up the learning phase and to improve the accuracy of the final model. We extrapolate run-time properties of the target applications including information about the instruction mix, memory access pattern, instruction-level parallelism, and communication requirements. Compared to state-of-the-art techniques, the proposed methodology reduces the prediction error by an order of magnitude on the instruction count and improves the accuracy by up to 1.3

Researchain Logo
Decentralizing Knowledge