Stefan Lankes | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Lankes is active.

Explore More

Publication

Featured researches published by Stefan Lankes.

international conference on high performance computing and simulation | 2011

Evaluation and improvements of programming models for the Intel SCC many-core processor

Carsten Clauss; Stefan Lankes; Pablo Reble; Thomas Bemmerl

Since the beginning of the multicore era, parallel processing has become prevalent across the board. On a traditional multicore system, a single operating system manages all cores and schedules threads and processes among them, inherently supported by hardware-implemented cache coherence protocols. However, a further growth of the number of cores per system implies an increasing chip complexity, especially with respect to the cache coherence protocols. Therefore, a very attractive alternative for future many-core systems is to waive the hardware-based cache coherency and to introduce a software-oriented message-passing based architecture instead: a so-called Cluster-on-Chip architecture. Intels Single-chip Cloud Computer (SCC), a many-core research processor with 48 non-coherent memory-coupled cores, is a very recent example for such a Cluster-on-Chip architecture. The SCC can be configured to run one operating system instance per core by partitioning the shared main memory in a strict manner. However, it is also possible to access the shared main memory in an unsplit and concurrent manner, provided that the cache coherency is then ensured by software. In this paper, we detail our first experiences gained while developing low-level software for message-passing and shared-memory programming on the SCC. In doing so, we evaluate the potential of both programming models and we show how these models can be improved especially with respect to the SCCs many-core architecture.

international parallel and distributed processing symposium | 2003

Integration of a CAN-based connection-oriented communication model into Real-Time CORBA

Stefan Lankes; Andreas Jabs; T. Bernmerl

The Real-Time CORBA and minimumCORBA specifications are important steps towards defining standard-based middleware which can satisfy real-time requirements in an embedded system. These requirements can only be fulfilled if the middleware utilizes the features of a real-time network. The controller area network (CAN) is one of the most important networks in the field of real-time embedded systems. Consequently, this paper presents a CAN-based connection-oriented point-to-point communication model and its integration into Real-Time CORBA. In order to make efficient use of the advantages of CAN, we present an inter-ORB protocol, which uses smaller message headers for CAN and maps the CAN priorities to a band of CORBA priorities. We also present design and implementation details with some preliminary performance results.

international symposium on object component service oriented real time distributed computing | 2002

A time-triggered Ethernet protocol for Real-Time CORBA

Stefan Lankes; Andreas Jabs; Michael Reke

The Real-Time CORBA and minimum CORBA specifications are important steps towards defining standard-based middleware which can satisfy real-time requirements in an embedded system. This real-time middleware must be based on a real-time operating system (RTOS) and a real-time network. This article presents a new time-triggered Ethernet protocol that has been implemented under RTLinux. Furthermore it describes a Real-Time CORBA implementation called ROFES, which is based on this new real-time network.

parallel processing and applied mathematics | 2009

Affinity-on-next-touch: an extension to the Linux kernel for NUMA architectures

Stefan Lankes; Boris Bierbaum; Thomas Bemmerl

For many years now, NUMA architectures are being used in the design of large shared memory computers and they are gaining importance even for smaller-scale systems. On a NUMA machine, the distribution of data has a significant impact on the performance and scalability of data-intensive programs, because of the difference in access speed between local and remote parts of the memory system. Unfortunately, memory access patterns are often very complex and difficult to predict. Affinity-on-next-touch may be a useful page placement strategy to distribute the data in a suitable manner, but support for it is missing from the current Linux kernel. In this paper, we present an extension to the Linux kernel which implements this strategy and compare it with alternative approaches.

european conference on parallel processing | 2014

Migration Techniques in HPC Environments

Simon Pickartz; Ramy Gad; Stefan Lankes; Lars Nagel; Tim Süß; André Brinkmann; Stephan Krempel

Process migration is an important feature in modern computing centers as it allows for a more efficient use and maintenance of hardware. Especially in virtualized infrastructures it is successfully exploited by schemes for load balancing and energy efficiency. One can divide the tools and techniques into three groups: Process-level migration, virtual machine migration, and container-based migration.

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface | 2011

Performance tuning of SCC-MPICH by means of the proposed MPI-3.0 tool interface

Carsten Clauss; Stefan Lankes; Thomas Bemmerl

The Single-Chip Cloud Computer (SCC) experimental processor is a 48-core concept vehicle created by Intel Labs as a platform for many-core software research. Intel provides a customized programming library for the SCC, called RCCE, that allows for fast message-passing between the cores. For that purpose, RCCE offers an application programming interface (API) with a semantics that is derived from the well-established MPI standard. However, while the MPI standard offers a very broad range of functions, the RCCE API is consciously kept small and far from implementing all the features of the MPI standard. For this reason, we have implemented an SCC-customized MPI library, called SCC-MPICH, which in turn is based upon an extension to the SCC-native RCCE communication library. In this contribution, we will present SCC-MPICH and we will show how performance analysis as well as performance tuning for this library can be conducted by means of a prototype of the proposed MPI-3.0 tool information interface.

programming models and applications for multicores and manycores | 2012

Revisiting shared virtual memory systems for non-coherent memory-coupled cores

Stefan Lankes; Pablo Reble; Oliver Sinnen; Carsten Clauss

The growing number of cores per chip implies an increasing chip complexity, especially with respect to hardware-implemented cache coherence protocols. An attractive alternative for future many-core systems is to waive the hardware-based cache coherency and to introduce a software-oriented approach instead: a so-called Cluster-on-Chip architecture. The Single-chip Cloud Computer (SCC) is a recent research processor of such architectures. This paper presents an approach to deal with the missing cache coherence protocol by using a software managed cache coherence system, which is based on the well-established concept of a shared virtual memory (SVM) management system. Through SCCs unique features like a new memory type, which is directly integrated on the processor die, new and capable options exist to realize an SVM system. The convincing performance results presented in this paper show that nearly forgotten concepts will become attractive again for future many-core systems.

international symposium on object component service oriented real time distributed computing | 2001

Design and implementation of a SCI-based real-time CORBA

Stefan Lankes; Michael Pfeiffer; Thomas Bemmerl

The Real-Time CORBA and minimumCORBA specifications in the forthcoming CORBA 3.0 standard are important steps towards defining standard-based middleware which can satisfy real time requirements in an embedded system. The article describes these new specifications and an implementation called ROFES. ROFES supports different network architectures, for example the Scalable Coherent Interface (SCI). Furthermore, the article examines the SCI-network and whether it possesses real time characteristics.

international conference on high performance computing and simulation | 2013

The development of a scheduling system GPUSched for graphics processing units

Ayman Tarakji; Maximillian Marx; Stefan Lankes

Single unified programing interfaces and languages like CUDA and OpenCL have been released allowing algorithms across all fields of application to be coded and executed on the graphics processing units. In modern computer systems, the GPU is a processing device accessed through a host and it is not able to run its programs autonomously as a process in the operating system on the advertising host. The host administrates the work on the device including memory transfers, the context switch in the host process to access the device and the launch of kernels (GPU programs). With more and more software featuring GPU support, the access to this shared device has to become more controllable on a system-wide level. Therefore, a better integration of these devices into the operating system including central administration will be necessary. This work takes a first step to create such a central administration for the graphics processing units, by developing a scheduling system called GPUSched. It allows the programmer to formulate different tasks, but then it takes over and manages the execution. The tasks represent GPU code of programs running on a general-purpose operating system. Using a set of microbenchmarks and applications on a GPU system, we show that achieving better utilization of the GPU resources and using these as coprocessors to handle different tasks involve realistic trade-offs.

Lecture Notes in Computer Science | 2006

The new multidevice architecture of MetaMPICH in the context of other approaches to grid-enabled MPI

Boris Bierbaum; Carsten Clauss; Martin Pöppe; Stefan Lankes; Thomas Bemmerl

MetaMPICH is an MPI implementation which allows the coupling of different computing resources to form a heterogeneous computing system called a meta computer. Such a coupled system may consist of multiple compute clusters, MPPs, and SMP servers, using different network technologies like Ethernet, SCI, and Myrinet. There are several other MPI libraries with similar goals available. We present the three most important of them and contrast their features and abilities to one another and to MetaMPICH. We especially highlight the recent advances made to MetaMPICH, namely the development of the new multidevice architecture for building a meta computer.

Explore More