Max Plauth
University of Potsdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Max Plauth.
international symposium on computing and networking | 2015
Max Plauth; Frank Feinbube; Frank Schlegel; Andreas Polze
GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited for irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension to GPU programming models called dynamic parallelism. This extension enables GPU programs to spawn new units of work directly on the GPU, allowing the refinement of subsequent work items based on intermediate results without any involvement of the main CPU. This work investigates methods for employing dynamic parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware. For the evaluation of the proposed approaches, a case study is conducted on the n-queens problem. Extensive benchmarks indicate that the benefits of improved resource utilization fail to outweigh high management overhead and runtime limitations due to the very fine level of granularity of the investigated problem. However, novel memory management concepts for passing parameters to child grids are presented. These general concepts are applicable to other, more coarse-grained problems that benefit from the use of dynamic parallelism.
international parallel and distributed processing symposium | 2016
Max Plauth; Wieland Hagen; Frank Feinbube; Felix Eberhardt; Lena Feinbube; Andreas Polze
The domains of parallel and distributed computing have been converging continuously up to the degree that state-of-the-art server computer systems incorporate characteristics from both domains: They comprise a hierarchy of enclosures, where each enclosure houses multiple processor sockets and each socket again contains multiple memory controllers. A global address space and cache coherency are facilitated using multiple layers of fast interconnection technologies even across enclosures. The growing popularity of such systems creates an urge for efficient mappings of cardinal algorithms onto such hierarchical architectures. However, the growing complexity of such systems and the inconsistencies between implementation strategies of different hardware vendors make it increasingly harder to do find efficient mapping strategies that are universally valid. In this paper, we present scalable optimization and mapping strategies in a case study of the popular Scale-Invariant Feature Transform (SIFT) computer vision algorithm. Our approaches are evaluated using a state-of-the-art hierarchical Non-Uniform Memory Access (NUMA) system with 240 physical cores and 12 terabytes of memory, apportioned across 16 NUMA nodes (sockets). SIFT is particularly interesting since the algorithm utilizes a variety of common data access patterns, thus allowing us to discuss the scaling properties of optimization strategies from the distributed and parallel computing domains and their applicability on emerging server systems.
ieee international conference on cloud networking | 2017
Mael Kimmerlin; Max Plauth; Seppo Heikkila; Tapio Niemi
The SSICLOPS consortium recently designed a transparent virtual network expansion mechanism for OpenStack. In this paper, we build up on this mechanism to propose features to improve the interconnection for inter-cloud federations. Based on distributed in-memory databases and High Energy Physics (HEP) workloads as two representative cloud computing workloads, we performed extensive performance evaluations to demonstrate that in a setup comprised of five sites across Europe, the performances of our interconnection agent are similar to the performances of the legacy VPNaaS feature provided by OpenStack. However, the interconnection agent is not restricted to the exemplary uses cases, as it is applicable to arbitrary workloads. This enables us to work in the direction of transparent inter-cloud live migration from a networking point of view.
european conference on networks and communications | 2017
Mael Kimmerlin; Peer Hasselmeyer; Seppo Heikkila; Max Plauth; Pawel Parol; Pasi Sarolahti
Cloud federation is receiving increasing attention due to the benefits of resilience and locality it brings to cloud providers and users. Our analysis of three diverse use cases shows that existing solutions are not addressing the federation needs of such use case applications. In this paper, we present an alternative approach to network federation, providing a model based on cloud-to-cloud agreements. In our scenarios, companies hosting their own OpenStack clouds need to run machines transparently in another cloud, provided by a company they have an agreement with. Our solution provides multiple benefits to cloud providers and users detailed in this paper. Our implementation outperforms the VPNaaS solution in OpenStack in terms of throughput.
ieee congress on services | 2008
Matthias Jacob; Alexander Kuscher; Max Plauth; Christoph Thiele
There is a large amount of information about celebrities spread all over the Web hidden inside innumerable news and blogs, pictures on Flickr or videos on YouTube. Having this information combined and aggregated would be of great benefit to many customers. In this document we will describe the architecture and the (business) value of a system that not only collates information pre-formatted by other Web services but also provides a self-developed named entity recognition algorithm for extracting the names of celebrities from different data sources and then processes and enriches them by our mash-up application.
international symposium on computing and networking | 2016
Wieland Hagen; Max Plauth; Felix Eberhardt; Frank Feinbube; Andreas Polze
For the implementation of data-intensive C++ applications for cache coherent Non-Uniform Memory Access (NUMA) systems, both massive parallelism and data locality have to be considered. While massive parallelism has been largely understood, the shared memory paradigm is still deeply entrenched in the mindset of many C++ software developers. Hence, data locality aspects of NUMA systems have been widely neglected thus far. At first sight, applying shared nothing approaches might seem like a viable workaround to address locality. However, we argue that developers should be enabled to address locality without having to surrender the advantages of the shared address space of cache coherent NUMA systems. Based on an extensive review of parallel programming languages and frameworks, we propose a programming model specialized for NUMA-aware C++ development that incorporates essential mechanisms for parallelism and data locality. We suggest that these mechanisms should be used to implement specialized data structures and algorithm templates which encapsulate locality, data distribution, and implicit data parallelism. We present an implementation of the proposed programming model in the form of a C++ framework. To demonstrate the applicability of our programming model, we implement a prototypical application on top of this framework and evaluate its performance.
parallel and distributed computing: applications and technologies | 2014
Max Plauth; Frank Feinbube; Peter Tröger; Andreas Polze
Blind Signal Separation is an algorithmic problem class that deals with the restoration of original signal data from a signal mixture. Implementations, such as Fast ICA, are optimized for parallelization on CPU or first-generation GPU hardware. With the advent of modern, compute centered GPU hardware with powerful features such as dynamic parallelism support, these solutions no longer leverage the available hardware performance in the best-possible way. We present an optimized implementation of the FastICA algorithm, which is specifically tailored for next-generation GPU architectures such as Nvidia Kepler. Our proposal achieves a two digit factor of speedup in the prototype implementation, compared to a multithreaded CPU implementation. Our custom matrix multiplication kernels, tailored specifically for the use case, contribute to the speedup by delivering better performance than the state-of-the-art CUBLAS library.
ieee international conference on cloud engineering | 2018
Jens Hiller; Mael Kimmerlin; Max Plauth; Seppo Heikkila; Stefan Klauck; Ville Lindfors; Felix Eberhardt; Dariusz Bursztynowski; Jesus Llorente Santos; Oliver Hohlfeld; Klaus Wehrle
Cloud computing offers the potential to store, manage, and process data in highly available, scalable, and elastic environments. Yet, these environments still provide very limited and inflexible means for customers to control their data. For example, customers can neither specify security of inter-cloud communication bearing the risk of information leakage, nor comply with laws requiring data to be kept in the originating jurisdiction, nor control sharing of data with third parties on a fine-granular basis. This lack of control can hinder cloud adoption for data that falls under regulations. In this paper, we show in six use cases how cloud environments can be enriched with policy language support to give customers control over cloud data. Our use cases are based on realizing policy language support in all three cloud environment layers, i.e., IaaS, PaaS, and SaaS. Specifically, we present policy-aware resource management (with OpenStack) and dynamic network configuration. With CERNs big data storage and the in-memory database Hyrise, we show realization for storage and further exemplify policy-aware cloud processing by network function virtualization which enables Orange to offload customer home gateways to the cloud. Finally, we discuss benefits of policy support in F-Secures Security Cloud. These use cases show the feasibility of realizing customer control with policy support in the cloud. Thus, our work enables customers with regulated data to tap cloud benefits and significantly broadens the market for cloud providers.
international parallel and distributed processing symposium | 2017
Max Plauth; Christoph Sterz; Felix Eberhardt; Frank Feinbube; Andreas Polze
Cost models play an important role for the efficient implementation of software systems. These models can be embedded in operating systems and execution environments to optimize execution at run time. Even though non-uniform memory access (NUMA) architectures are dominating todays server landscape, there is still a lack of parallel cost models that represent NUMA system sufficiently. Therefore, the existing NUMA models are analyzed, and a two-step performance assessment strategy is proposed that incorporates low-level hardware counters as performance indicators. To support the two-step strategy, multiple tools are developed, all accumulating and enriching specific hardware event counter information, to explore, measure, and visualize these low-overhead performance indicators. The tools are showcased and discussed alongside specific experiments in the realm of performance assessment.
european conference on service-oriented and cloud computing | 2017
Max Plauth; Lena Feinbube; Andreas Polze
The increasing prevalence of the microservice paradigm creates a new demand for low-overhead virtualization techniques. Complementing containerization, unikernels are emerging as alternative approaches. With both techniques undergoing rapid improvements, the current landscape of lightweight virtualization approaches presents a confusing scenery, complicating the task of choosing a suited technology for an intended purpose. This work provides a comprehensive performance comparison covering containers, unikernels, whole-system virtualization, native hardware, and combinations thereof. Representing common workloads in microservice-based applications, we assess application performance using HTTP servers and a key-value store. With the microservice deployment paradigm in mind, we evaluate further characteristics such as startup time, image size, network latency, and memory footprint.