Frank Bellosa
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Frank Bellosa.
acm sigops european workshop | 2000
Frank Bellosa
A prerequisite of energy-aware scheduling is precise knowledge of any activity inside the computer system. Embedded hardware monitors (e.g., processor performance counters) have proved to offer valuable information in the field of performance analysis. The same approach can be applied to investigate the energy usage patterns of individual threads. We use information about active hardware units (e.g., integer/floating-point unit, cache/memory interface) gathered by event counters to establish a thread-specific energy accounting. The evaluation shows that the correlation of events and energy values provides the necessary information for energy-aware scheduling policies.Our approach to OS-directed power management adds the energy usage pattern to the runtime context of a thread. Depending on the field of application we present two scenarios that benefit from applying energy usage patterns: Workstations with passive cooling on the one hand and battery-powered mobile systems on the other hand.Energy-aware scheduling evaluates the energy usage of each thread and throttles the system activity so that the scheduling goal is achieved. In workstations we throttle the system if the average energy use exceeds a predefined power-dissipation capacity. This makes a compact, noiseless and affordable system design possible that meets sporadic yet high demands in computing power. Nowadays, more and more mobile systems offer the features of reducible clock speed and dynamic voltage scaling. Energy-aware scheduling can employ these features to yield a longer battery life by slowing down low-priority threads while preserving a certain quality of service.
compilers, architecture, and synthesis for embedded systems | 2002
Andreas Weissel; Frank Bellosa
Scalability of the core frequency is a common feature of low-power processor architectures. Many heuristics for frequency scaling were proposed in the past to find the best trade-off between energy efficiency and computational performance. With complex applications exhibiting unpredictable behavior these heuristics cannot reliably adjust the operation point of the hardware because they do not know where the energy is spent and why the performance is lost.Embedded hardware monitors in the form of event counters have proven to offer valuable information in the field of performance analysis. We will demonstrate that counter values can also reveal the power-specific characteristics of a thread.In this paper we propose an energy-aware scheduling policy for non-real-time operating systems that benefits from event counters. By exploiting the information from these counters, the scheduler determines the appropriate clock frequency for each individual thread running in a time-sharing environment. A recurrent analysis of the thread-specific energy and performance profile allows an adjustment of the frequency to the behavioral changes of the application. While the clock frequency may vary in a wide range, the application performance should only suffer slightly (e.g. with 10% performance loss compared to the execution at the highest clock speed). Because of the similarity to a car cruise control, we called our scheduling policy Process Cruise Control. This adaptive clock scaling is accomplished by the operating system without any application support.Process Cruise Control has been implemented on the Intel XScale architecture, that offers a variety of frequencies and a set of configurable event counters. Energy measurements of the target architecture under variable load show the advantage of the proposed approach.
european conference on computer systems | 2006
Andreas Merkel; Frank Bellosa
Actions usually taken to prevent processors from overheating, such as decreasing the frequency or stopping the execution flow, also degrade performance. Multiprocessor systems, however, offer the possibility of moving the task that caused a CPU to overheat away to some other, cooler CPU, so throttling becomes only a last resort taken if all of a systems processors are hot. Additionally, the scheduler can take advantage of the energy characteristics of individual tasks, and distribute hot tasks as well as cool tasks evenly among all CPUs.This work presents a mechanism for determining the energy characteristics of tasks by means of event monitoring counters, and an energy-aware scheduling policy that strives to assign tasks to CPUs in a way that avoids overheating individual CPUs. Our evaluations show that the benefit of avoiding throttling outweighs the overhead of additional task migrations, and that energy-aware scheduling in many cases increases the systems throughput.
european conference on computer systems | 2010
Andreas Merkel; Jan Stoess; Frank Bellosa
In multicore systems, shared resources such as caches or the memory subsystem can lead to contention between applications running on different cores, entailing reduced performance and poor energy efficiency. The characteristics of individual applications, the assignment of applications to machines and execution contexts, and the selection of processor frequencies have a dramatic impact on resource contention, performance, and energy efficiency. We employ the concept of task activity vectors for caracterizing applications by resource utilization. Based on this characterization, we apply migration and co-scheduling policies that improve performance and energy efficiency by combining applications that use complementary resources, and use frequency scaling when scheduling cannot avoid contention owing to inauspicious workloads. We integrate the policies into an operating system scheduler and into a virtualization system, allowing placement decisions to be made both within and across physical nodes, and reducing contention both for individual tasks and complete applications. Our evaluation based on the Linux operating system kernel and the KVM virtualization environment shows that resource-conscious scheduling reduces the energy delay product considerably.
Journal of Parallel and Distributed Computing | 1996
Frank Bellosa; Martin Steckermeier
Large caches used in scalable shared-memory architectures can avoid high memory access time only if data is referenced within the address scope of the cache. Consequently, locality is the key issue in multiprocessor performance. While CPU utilization still determines scheduling decisions of contemporary schedulers, we propose novel scheduling policies based on locality information derived from cache miss counters. A locality-conscious scheduler can reduce the costs for reloading the cache after each context switch. Thus, the potential benefit of using locality information increases with the frequency of scheduling decisions. Lightweight threads have become a common abstraction in the field of programming languages and operating systems. User-lever schedulers make frequent context switches affordable and therefore draw most profit from the usage of locality information if the lifetime of cachelines exceeds scheduling cycles. This paper examines the performance implications of locality information usage in thread scheduling algorithms for scalable shared-memory multiprocessors. A prototype implementation shows that a locality-conscious scheduler outperforms approaches ignoring locality information.
european conference on computer systems | 2008
Andreas Merkel; Frank Bellosa
Non-uniform utilization of functional units in combination with hardware mechanisms such as clock gating leads to different power consumptions in different parts of a processor chip. This in turn leads to non-uniform temperature distributions and problematic local hotspots, depending on the characteristics of the currently running task. The operating systems scheduler, responsible for deciding which task to run at what time, can influence temperature distribution. Our work investigates what the operating system can do to alleviate the problem of hotspots. We propose task activity vectors describing which functional units a task uses to what degree. With the knowledge provided by these vectors, the scheduler can schedule tasks using different units successively, distribute tasks using a particular unit excessively over the systems processors, or mix tasks using different units on a SMT processor. We implemented several vector-based scheduling strategies for Linux. Our evaluations show that vector-based scheduling considerably reduces hotspots.
high performance computing and communications | 2013
Mathias Gottschlag; Marius Hillenbrand; Jens Kehne; Jan Stoess; Frank Bellosa
Over the last few years, running high performance computing applications in the cloud has become feasible. At the same time, GPGPUs are delivering unprecedented performance for HPC applications. Cloud providers thus face the challenge to integrate GPGPUs into their virtualized platforms, which has proven difficult for current virtualization stacks. In this paper, we present LoGV, an approach to virtualize GPGPUs by leveraging protection mechanisms already present in modern hardware. LoGV enables sharing of GPGPUs between VMs as well as VM migration without modifying the host driver or the guests CUDA runtime. LoGV allocates resources securely in the hyper visor which then grants applications direct access to these resources, relying on GPGPU hardware features to guarantee mutual protection between applications. Experiments with our prototype have shown an overhead of les.s than 4% compared to native execution.
IEE Proceedings - Software | 2001
Uwe Rastofer; Frank Bellosa
The aim of component-based software engineering is to create applications from reusable, exchangeable and connectable components. However, current component models lack support for important concepts of distributed embedded real-time systems, such as execution time and resource usage. These non-functional properties of a component are as important as its functionality. In addition, the non-functional properties are influenced by the platform on which the component is executed. A component model is proposed that separates the components functionality from the platform-specific issues of concurrency, synchronisation and distribution. A technique that describes the behaviour of a component in a path-based notation similar to use case maps (UCMs) is presented. A method for deducing from these descriptions the behaviour of an application that consists of connected components is also shown. The paths also contain information on real-time requirements of the application. The authors also show how to adapt the components to an execution platform and how to create real-time applications with predictable properties from these components.
workshop on hot topics in operating systems | 2001
M. Golm; E. Kleinder; Frank Bellosa
Early type-safe operating systems were hampered by poor performance. Contrary to these experiences we show that an operating system that is founded on an object-oriented, type-safe intermediate code can compete with MMU-based microkernels concerning performance while widening the realm of possibilities. Moving from hardware-based protection to software-based protection offers new options for operating system quality, flexibility, and versatility that are superior to traditional process models based on MMU protection. However, using a type-safe language-such as Java-alone, is not sufficient to achieve an improvement. While other Java operating systems adopted a traditional process concept, JX implements fine-grained protection boundaries. The JX System architecture consists of a set of Java components executing on the JX core that is responsible for system initialization, CPU context switching and low-level domain management. The Java code is organized in components which are loaded into domains, verified, and translated to native code. JX runs on commodity PC hardware, supports network communication, a frame grabber device, and contains an Ext2-compatible file system. Without extensive optimization this file system already reaches a throughput of 50% of Linux.
ieee international conference on cloud computing technology and science | 2012
Marius Hillenbrand; Viktor Mauch; Jan Stoess; Konrad Miller; Frank Bellosa
High Performance Computing (HPC) employs fast interconnect technologies to provide low communication and synchronization latencies for tightly coupled parallel compute jobs. Contemporary HPC clusters have a fixed capacity and static runtime environments; they cannot elastically adapt to dynamic workloads, and provide a limited selection of applications, libraries, and system software. In contrast, a cloud model for HPC clusters promises more flexibility, as it provides elastic virtual clusters to be available on-demand. This is not possible with physically owned clusters. In this paper, we present an approach that makes it possible to use InfiniBand clusters for HPC cloud computing. We propose a performance-driven design of an HPC IaaS layer for InfiniBand, which provides throughput and latency-aware virtualization of nodes, networks, and network topologies, as well as an approach to an HPC-aware, multi-tenant cloud management system for elastic virtualized HPC compute clusters.