Mario Kicherer
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mario Kicherer.
high performance embedded architectures and compilers | 2011
Mario Kicherer; Rainer Buchty; Wolfgang Karl
Todays approaches towards heterogeneous computing rely on either the programmer or dedicated programming models to efficiently integrate heterogeneous components. In this work, we propose an adaptive cost-aware function-migration mechanism built on top of a light-weight hardware abstraction layer. With this mechanism, the highly dynamic task of choosing the most beneficial processing unit will be hidden from the programmer while causing only minor variation in the work and program flow. The migration mechanism transparently adapts to the current workload and system environment without the necessity of JIT compilation or binary translation. Evaluation shows that our approach successfully adapts to new circumstances and predicts the most beneficial processing unit (PU). Through fine-grained PU selection, our solution achieves a speedup of up to 2.27 for the average kernel execution time but introduces only a marginal overhead in case its services are not required.
high performance embedded architectures and compilers | 2012
Mario Kicherer; Fabian Nowak; Rainer Buchty; Wolfgang Karl
Nowadays, many possible configurations of heterogeneous systems exist, posing several new challenges to application development: different types of processing units usually require individual programming models with dedicated runtime systems and accompanying libraries. If these are absent on an end-user system, e.g. because the respective hardware is not present, an application linked against these will break. This handicaps portability of applications being developed on one system and executed on other, differently configured heterogeneous systems. Moreover, the individual profit of different processing units is normally not known in advance. In this work, we propose a technique to effectively decouple applications from their accelerator-specific parts, respectively code. These parts are only linked on demand and thereby an application can be made portable across systems with different accelerators. As there are usually multiple hardware-specific implementations for a certain task, e.g., a CPU and a GPU version, a method is required to determine which are usable at all and which one is most suitable for execution on the current system. With our approach, application and hardware programmers can express the requirements and the abilities of the application and the hardware-specific implementations in a simplified manner. During runtime, the requirements and abilities are compared with regard to the present hardware in order to determine the usable implementations of a task. If multiple implementations are usable, an online-learning history-based selector is employed to determine the most efficient one. We show that our approach chooses the fastest usable implementation dynamically on several systems while introducing only a negligible overhead itself. Applied to an MPI application, our mechanism enables exploitation of local accelerators on different heterogeneous hosts without preliminary knowledge or modification of the application.
automation, robotics and control systems | 2009
Rainer Buchty; David Kramer; Mario Kicherer; Wolfgang Karl
When targeting hardware accelerators and reconfigurable processing units, the question of programmability arises, i.e. how different implementations of individual, configuration-specific functions are provided. Conventionally, this is resolved either at compilation time with a specific hardware environment being targeted, by initialization routines at program start, or decision trees at run-time. Such technique are, however, hardly applicable to dynamically changing architectures. Furthermore, these approaches show conceptual drawbacks such as requiring access to source code and requiring upfront knowledge of future system configurations, as well as overloading the code with reconfiguration-related control routines. We therefore present a low-overhead technique enabling on-demand resolving of individual functions; this technique can be applied in two different manners; we will discuss the benefits of the individual implementations and show how both approaches can be used to establish code compatibility between different heterogeneous, reconfigurable, and parallel architectures. Further we will show, that both approaches are exposing an insignificant overhead.
international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2009
Rainer Buchty; Mario Kicherer; David Kramer; Wolfgang Karl
In this paper, we present a particularly lightweight, integrative approach to programming and executing applications targeting heterogeneous, dynamically reconfigurable parallel systems. Based on an analysis of existing approaches, we strictly focused on compatibility and lightweightedness. Our approach therefore follows an embrace-and-extend strategy and achieves desired functionality by adopting and augmenting existing system services, achieving the desired properties. We implemented this concept using the Linux OS and demonstrated its suitability with a heterogeneous platform comprising IA32 multicore processors and current FPGA accelerator hardware using state-of-the-art HyperTransport interconnection technology.
Journal of Systems Architecture | 2015
Mario Kicherer; Wolfgang Karl
The best mapping of a task to one or more processing units in a heterogeneous system depends on multiple variables. Several approaches based on runtime systems have been proposed that determine the best mapping under given circumstances automatically. Some of them also consider dynamic events like varying problem sizes or resource competition that may change the best mapping during application runtime but only a few even consider that task execution may fail. While aging or overheating are well-known causes for sudden faults, the ongoing miniaturization and the growing complexity of heterogeneous computing are expected to create further threats for successful application execution. However, if properly incorporated, heterogeneous systems also offer the opportunity to recover from different types of faults in hardware as well as in software. In this work, we propose a combination of both topics, dynamic performance-oriented task mapping and dependability, to leverage this opportunity. As we will show, this combination not only enables tolerating faults in hardware and software with minor assistance of the developer, it also provides benefits for application development itself and for application performance in case of faults due to a new metric and automatic data management.
Architecture of Computing Systems (ARCS), 2010 23rd International Conference on | 2011
Fabian Nowak; Mario Kicherer; Rainer Buchty; Wolfgang Karl
Architecture of Computing Systems (ARCS), 2010 23rd International Conference on | 2011
Mario Kicherer; Fabian Nowak; Rainer Buchty; Wolfgang Karl
arXiv: Operating Systems | 2014
Mario Kicherer; Wolfgang Karl
Archive | 2011
Ruben Titos-Gil; Manuel E. Acacio; José M. García; Tim Harris; A. Cristal; Osman S. Unsal; Ibrahim Hur; Mateo Valero; Yongjoo Kim; Jongeun Lee; Yunheung Paek; Madhura Purnaprajna; Paolo Ienne; Petar Radojković; Sylvain Girbal; Arnaud Grasset; Eduardo Quiñones; Sami Yehia; Francisco J. Cazorla; Alejandro Rico; Felipe Cabarcas; Carlos Villavieja; Milan Pavlovic; Augusto Vega; Yoav Etsion; Alex Ramirez; Selma Saidi; Pranav Tendulkar; Thierry Lepley; Oded Maler
arcs workshops | 2010
Fabian Nowak; Mario Kicherer; Rainer Buchty; Wolfgang Karl