Tim Süß | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tim Süß is active.

Explore More

Publication

Featured researches published by Tim Süß.

european conference on parallel processing | 2014

Migration Techniques in HPC Environments

Simon Pickartz; Ramy Gad; Stefan Lankes; Lars Nagel; Tim Süß; André Brinkmann; Stephan Krempel

Process migration is an important feature in modern computing centers as it allows for a more efficient use and maintenance of hardware. Especially in virtualized infrastructures it is successfully exploited by schemes for load balancing and energy efficiency. One can divide the tools and techniques into three groups: Process-level migration, virtual machine migration, and container-based migration.

acm symposium on parallel algorithms and architectures | 2014

Scheduling shared continuous resources on many-cores

André Brinkmann; Peter Kling; Friedhelm Meyer auf der Heide; Lars Nagel; Sören Riechers; Tim Süß

We consider the problem of scheduling a number of jobs on m identical processors sharing a continuously divisible resource. Each job j comes with a resource requirement rj∈[0,1]. The job can be processed at full speed if granted its full resource requirement. If receiving only an x-portion of r_j, it is processed at an x-fraction of the full speed. Our goal is to find a resource assignment that minimizes the makespan (i.e., the latest completion time). Variants of such problems, relating the resource assignment of jobs to their processing speeds, have been studied under the term discrete-continuous scheduling. Known results are either very pessimistic or heuristic in nature. In this paper, we suggest and analyze a slightly simplified model. It focuses on the assignment of shared continuous resources to the processors. The job assignment to processors and the ordering of the jobs have already been fixed. It is shown that, even for unit size jobs, finding an optimal solution is NP-hard if the number of processors is part of the input. Positive results for unit size jobs include an efficient optimal algorithm for 2 processors. Moreover, we prove that balanced schedules yield a 2-1/m-approximation for a fixed number of processors. Such schedules are computed by our GreedyBalance algorithm, for which the bound is tight.

european conference on computer systems | 2015

Deriving and comparing deduplication techniques using a model-based classification

Jürgen Kaiser; André Brinkmann; Tim Süß; Dirk Meister

Data deduplication has been a hot research topic and a large number of systems have been developed. These systems are usually seen as an inherently linked set of characteristics. However, a detailed analysis shows independent concepts that can be used in other systems. In this work, we perform this analysis on the main representatives of deduplication systems. We embed the results in a model, which shows two yet unexplored combinations of characteristics. In addition, the model enables a comprehensive evaluation of the representatives and the two new systems. We perform this evaluation based on real world data sets.

ieee international conference on high performance computing data and analytics | 2017

A configurable rule based classful token bucket filter network request scheduler for the lustre file system

Yingjin Qian; Xi Li; Shuichi Ihara; Lingfang Zeng; Jürgen Kaiser; Tim Süß; André Brinkmann

HPC file systems today work in a best-effort manner where individual applications can flood the file system with requests, effectively leading to a denial of service for all other tasks. This paper presents a classful Token Bucket Filter (TBF) policy for the Lustre file system. The TBF enforces Remote Procedure Call (RPC) rate limitations based on (potentially complex) Quality of Service (QoS) rules. The QoS rules are enforced in Lustres Object Storage Servers, where each request is assigned to an automatically created QoS class. The proposed QoS implementation for Lustre enables various features for each class including the support for high-priority and real-time requests even under heavy load and the utilization of spare bandwidth by less important tasks under light load. The framework also enables dependent rules to change a jobs RPC rate even at very small timescales. Furthermore, we propose a Global Rate Limiting (GRL) algorithm to enforce system-wide RPC rate limitations.

conference on computability in europe | 2010

On the complexity of local search for weighted standard set problems

Dominic Dumrauf; Tim Süß

In this paper, we study the complexity of computing locally optimal solutions for weighted versions of standard set problems such as SETCOVER, SETPACKING, and many more. For our investigation, we use the framework of PLS, as defined in Johnson et al., [14]. We show that for most of these problems, computing a locally optimal solution is already PLS-complete for a simple natural neighborhood of size one. For the local search versions of weighted SETPACKING and SETCOVER, we derive tight bounds for a simple neighborhood of size two. To the best of our knowledge, these are one of the very few PLS results about local search for weighted standard set problems.

reconfigurable computing and fpgas | 2009

Communication Performance Characterization for Reconfigurable Accelerator Design on the XD1000

Tobias Schumacher; Tim Süß; Christian Plessl; Marco Platzner

Providing customized memory architectures is key for achieving high-performance with reconfigurable accelerators. Since reconfigurable computers provide limited possibilities for customizing the organization of external memory, a specific challenge is to make use of the existing memory layout in a flexible, yet efficient way. In this paper we build on IMORC, our architectural template and on-chip network for creating reconfigurable accelerators, and discuss its infrastructure for accessing memory. We characterize the IMORC communication bandwidth on the XtremeData XD1000 reconfigurable computer. Based on this characterization, we present a z-buffer compositing accelerator which is able to double the frame-rate of a parallel renderer.

ieee international conference on high performance computing, data, and analytics | 2016

Accelerating Application Migration in HPC

Ramy Gad; Simon Pickartz; Tim Süß; Lars Nagel; Stefan Lankes; André Brinkmann

It is predicted that the number of cores per node will rapidly increase with the upcoming era of exascale supercomputers. As a result, multiple applications will have to share one node and compete for the (often scarce) resources available on this node. Furthermore, the growing number of hardware components causes a decrease in the mean time between failures. Application migration between nodes has been proposed as a tool to mitigate these two problems: Bottlenecks due to resource sharing can be addressed by load balancing schemes which migrate applications; and hardware errors can often be tolerated by the system if faulty nodes are detected and processes are migrated ahead of time.

networking architecture and storages | 2010

Evaluation of a c-Load-Collision-Protocol for Load-Balancing in Interactive Environments

Tim Süß; Timo Wiesemann; Matthias Fischer

Many professional cluster systems consist of nodes with different hardware configurations. Such heterogeneous environments require different load-balancing techniques than homogenous environments. The c-load-collision-protocol is able to achieve good results for data-management purposes. Using this protocol, we propose a way for load-balancing in interactive rendering environments. For this work, we implemented a parallel rendering system and took different picking strategies into account to compare the results. The advantage of our approach compared to other approaches is that we group the available nodes of a cluster into two different categories, based on the hardware abilities. Some nodes are used solely for rendering, while others serve as secondary storage and to assist the former ones by performing auxiliary calculations.

international symposium on visual computing | 2012

Asynchronous Occlusion Culling on Heterogeneous PC Clusters for Distributed 3D Scenes

Tim Süß; Clemens Koch; Claudius Jähn; Matthias Fischer; Friedhelm Meyer auf der Heide

We present a parallel rendering system for heterogeneous PC clusters to visualize massive models. One single, powerful visualization node is supported by a group of backend nodes with weak graphics performance. While the visualization node renders the visible objects, the backend nodes asynchronously perform visibility tests and supply the front end with visible scene objects. The visualization node stores only currently visible objects in its memory, while the scene is distributed among the backend nodes’ memory without redundancy. To efficiently compute the occlusion tests in spite of that each backend node stores only a fraction of the original geometry, we complete the scene by adding highly simplified versions of the objects stored on other nodes. We test our system with 15 backend nodes. It is able to render a ≈ 350,M polygons (≈ 8.5,GiB) large aircraft model with 20, to 30,fps and thus allows a walk-through in real-time.

file and storage technologies | 2013