Is this you? Create Your Porfile

Jan Stoess

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jan Stoess is active.

Explore More

Publication

Featured researches published by Jan Stoess.

european conference on computer systems | 2010

Resource-conscious scheduling for energy efficiency on multicore processors

Andreas Merkel; Jan Stoess; Frank Bellosa

In multicore systems, shared resources such as caches or the memory subsystem can lead to contention between applications running on different cores, entailing reduced performance and poor energy efficiency. The characteristics of individual applications, the assignment of applications to machines and execution contexts, and the selection of processor frequencies have a dramatic impact on resource contention, performance, and energy efficiency. We employ the concept of task activity vectors for caracterizing applications by resource utilization. Based on this characterization, we apply migration and co-scheduling policies that improve performance and energy efficiency by combining applications that use complementary resources, and use frequency scaling when scheduling cannot avoid contention owing to inauspicious workloads. We integrate the policies into an operating system scheduler and into a virtualization system, allowing placement decisions to be made both within and across physical nodes, and reducing contention both for individual tasks and complete applications. Our evaluation based on the Linux operating system kernel and the KVM virtualization environment shows that resource-conscious scheduling reduces the energy delay product considerably.

high performance distributed computing | 2010

Providing a cloud network infrastructure on a supercomputer

Jonathan Appavoo; Amos Waterland; Dilma Da Silva; Volkmar Uhlig; Bryan S. Rosenburg; Eric Van Hensbergen; Jan Stoess; Robert W. Wisniewski; Udo Steinberg

Supercomputers and clouds both strive to make a large number of computing cores available for computation. More recently, similar objectives such as low-power, manageability at scale, and low cost of ownership are driving a more converged hardware and software. Challenges remain, however, of which one is that current cloud infrastructure does not yield the performance sought by many scientific applications. A source of the performance loss comes from virtualization and virtualization of the network in particular. This paper provides an introduction and analysis of a hybrid supercomputer software infrastructure, which allows direct hardware access to the communication hardware for the necessary components while providing the standard elastic cloud infrastructure for other components.

high performance computing and communications | 2013

LoGV: Low-Overhead GPGPU Virtualization

Mathias Gottschlag; Marius Hillenbrand; Jens Kehne; Jan Stoess; Frank Bellosa

Over the last few years, running high performance computing applications in the cloud has become feasible. At the same time, GPGPUs are delivering unprecedented performance for HPC applications. Cloud providers thus face the challenge to integrate GPGPUs into their virtualized platforms, which has proven difficult for current virtualization stacks. In this paper, we present LoGV, an approach to virtualize GPGPUs by leveraging protection mechanisms already present in modern hardware. LoGV enables sharing of GPGPUs between VMs as well as VM migration without modifying the host driver or the guests CUDA runtime. LoGV allocates resources securely in the hyper visor which then grants applications direct access to these resources, relying on GPGPU hardware features to guarantee mutual protection between applications. Experiments with our prototype have shown an overhead of les.s than 4% compared to native execution.

ieee international conference on cloud computing technology and science | 2012

Virtual InfiniBand clusters for HPC clouds

Marius Hillenbrand; Viktor Mauch; Jan Stoess; Konrad Miller; Frank Bellosa

High Performance Computing (HPC) employs fast interconnect technologies to provide low communication and synchronization latencies for tightly coupled parallel compute jobs. Contemporary HPC clusters have a fixed capacity and static runtime environments; they cannot elastically adapt to dynamic workloads, and provide a limited selection of applications, libraries, and system software. In contrast, a cloud model for HPC clusters promises more flexibility, as it provides elastic virtual clusters to be available on-demand. This is not possible with physically owned clusters. In this paper, we present an approach that makes it possible to use InfiniBand clusters for HPC cloud computing. We propose a performance-driven design of an HPC IaaS layer for InfiniBand, which provides throughput and latency-aware virtualization of nodes, networks, and network topologies, as well as an approach to an HPC-aware, multi-tenant cloud management system for elastic virtualized HPC compute clusters.

Operating Systems Review | 2007

Towards effective user-controlled scheduling for microkernel-based systems

Jan Stoess

With μ-kernel based systems becoming more and more prevalent, the demand for extensible resource management raises - and with it the demand for flexible thread scheduling. In this paper, we investigate the benefits and costs of a μ-kernel that exports scheduling from the kernel to user level. A key idea of our approach is to involve the user level whenever the μ-kernel encounters a situation that is ambiguous with respect to scheduling, and to permit the kernel to resolve the ambiguity based on user decisions. A further key aspect is that we rely on a generic, protection domain neutral interface between kernel and applications. For evaluation, we have developed a hierarchical user level scheduling architecture for the L4 μ-kernel, and a virtualization environment running on its top. Our environment supports Linux 2.6.9 guest operating systems on IA-32 processors. Experiments indicate an application overhead between 0 and 10 percent compared to a pure in-kernel scheduler solution, but also demonstrate that our architecture enables effective and accurate user-directed scheduling.

international workshop on runtime and operating systems for supercomputers | 2011

A light-weight virtual machine monitor for Blue Gene/P

Jan Stoess; Jonathan Appavoo; Udo Steinberg; Amos Waterland; Volkmar Uhlig; Jens Kehne

In this paper, we present a light-weight, micro--kernel-based virtual machine monitor (VMM) for the Blue Gene/P Supercomputer. Our VMM comprises a small μ-kernel with virtualization capabilities and, atop, a user-level VMM component that manages virtual BG/P cores, memory, and interconnects; we also support running native applications directly atop the μ-kernel. Our design goal is to enable compatibility to standard OSes such as Linux on BG/P via virtualization, but to also keep the amount of kernel functionality small enough to facilitate shortening the path to applications and lowering OS noise. Our prototype implementation successfully virtualizes a BG/P version of Linux with support for Ethernet-based communication mapped onto BG/Ps collective and torus network devices. First experiences and experiments show that our VMM still shows a substantial performance hit; nevertheless, our approach poses an interesting OS alternative for Supercomputers, providing the convenience of a fully-featured commodity software stack, while also promising to deliver the scalability and low latency of an HPC OS.

international conference on parallel and distributed systems | 2006

Flexible, low-overhead event logging to support resource scheduling

Jan Stoess; Volkmar Uhlig

Flexible resource management and scheduling policies require detailed system-state information. Traditional, monolithic operating systems with a centralized kernel derive the required information directly, by inspection of internal data structures or maintaining additional accounting data. In systems with distributed or multi-level resource managers that reside in different subsystems and protection domains, direct inspection is unfeasible. In this paper, we present how system event logging - a mechanism usually used in the context of performance analysis and debugging - can also be used for resource scheduling. Event logs provide accumulated, pre-processed, and structured state information independent of the internal structure of individual system components or applications. We describe methods of low-overhead data collection and data analysis and present a prototypical application to multiprocessor scheduling of virtual machines

ieee international conference on high performance computing data and analytics | 2012

A lightweight virtual machine monitor for Blue Gene/P

Jan Stoess; Udo Steinberg; Volkmar Uhlig; Jens Kehne; Jonathan Appavoo; Amos Waterland

In this paper, we present a lightweight, micro-kernel-based virtual machine monitor (VMM) for the Blue Gene/P supercomputer. Our VMM comprises a small µ-kernel with virtualization capabilities and, atop, a user-level VMM component that manages virtual Blue Gene/P cores, memory, and interconnects; we also support running native applications directly atop the µ-kernel. Our design goal is to enable compatibility to standard operating systems such as Linux on BG/P via virtualization, but to also keep the amount of kernel functionality small enough to facilitate shortening the path to applications and lowering operating system noise. Our prototype implementation successfully virtualizes a Blue Gene/P version of Linux with support for Ethernet-based communication mapped onto Blue Gene/P’s collective and torus network devices. Our first experiences and experiments show that our VMM still shows a substantial performance hit, and that support for native application environments is a key requirement towards fully exploiting the capabilities of a supercomputer. Altogether, our approach poses an interesting operating system alternative for supercomputers, providing the convenience of a fully featured commodity software stack, while also promising to deliver the scalability and low latency of an HPC operating system.

ieee international conference on cloud networking | 2012

Light-weight remote communication for high-performance cloud networks

Jens Kehne; Marius Hillenbrand; Jan Stoess; Frank Bellosa

In this paper, we present early experiences with libRIPC, a light-weight communication library for high-performance cloud networks. Coming cloud networks are expected to be tightly interconnected and to show capabilities formerly reserved to high-performance computing. LibRIPC aims to bring the benefits of such architectures to heterogeneous cloud workloads. LibRIPC was designed for low footprint and easy integration; it supports reconfiguration and mutually untrusted communication partners. LibRIPC offers short and long transmit primitives, which are optimized for control messages and bulk data transfer respectively. Early experiments with a Java-based web server indicate that libRIPC integrates well into typical cloud workloads and brings substantial speedup of at least a factor of three for larger data transfers compared to socket-based TCP/IP communication.

programming languages and operating systems | 2009

A microkernel API for fine-grained decomposition

Sebastian Reichelt; Jan Stoess; Frank Bellosa

Microkernel-based operating systems typically require special attention to issues that otherwise arise only in distributed systems. The resulting extra code degrades performance and increases development effort, severely limiting decomposition granularity. We present a new microkernel design that enables OS developers to decompose systems into very fine-grained servers. We avoid the typical obstacles by defining servers as lightweight, passive objects. We replace complex IPC mechanisms by a simple function-call approach, and our passive, module-like server model obviates the need to create threads in every server. Server code is compiled into small self-contained files, which can be loaded into the same address space (for speed) or different address spaces (for safety). For evaluation, we have developed a kernel according to our design, and a networking-capable multi-server system on top. Each driver is a separate server, and the networking stack is split into individual layers. Benchmarks on IA-32 hardware indicate promising results regarding server granularity and performance.

Explore More