Sven Mallach | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sven Mallach is active.

Explore More

Publication

Featured researches published by Sven Mallach.

Journal of Computational Science | 2011

A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on modern Multi- and Many-Core Architectures

Markus Geveler; Dirk Ribbrock; Sven Mallach; Dominik Göddeke

We present a software approach to hardware-oriented numerics which builds upon an augmented, previously published set of open-source libraries facilitating portable code development and optimisation on a wide range of modern computer architectures. In order to maximise e ciency, we exploit all levels of parallelism, including vectorisation within CPU cores, the Cell BE and GPUs, shared memory thread-level parallelism between cores, and parallelism between heterogeneous distributed memory resources in clusters. To evaluate and validate our approach, we implement a collection of modular building blocks for the easy and fast assembly and development of CFD applications based on the shallow water equations: We combine the Lattice-Boltzmann method with fluid-structure interaction techniques in order to achieve real-time simulations targeting interactive virtual environments. Our results demonstrate that recent multi-core CPUs outperform the Cell BE, while GPUs are significantly faster than conventional multi-threaded SSE code. In addition, we verify good scalability properties of our application on small clusters.

Computer Physics Communications | 2009

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

Danny van Dyk; Markus Geveler; Sven Mallach; Dirk Ribbrock; Dominik Göddeke; Carsten Gutwenger

Abstract We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEIs libraries, we achieve a two-fold speedup over straight forward C++ code using HONEIs SSE backend, and additional 3–4 and 4–16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summary Program title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queens University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the underlying hardware towards heterogeneity and parallelism. This is particularly relevant for data-intensive problems stemming from discretisations with local support, such as finite differences, volumes and elements. Solution method: To address these issues, we present a hardware aware collection of libraries combining the advantages of modern software techniques and hardware oriented programming. Applications built on top of these libraries can be configured trivially to execute on CPUs, GPUs or the Cell processor. In order to evaluate the performance and accuracy of our approach, we provide two domain specific applications; a multigrid solver for the Poisson problem and a fully explicit solver for 2D shallow water equations. Restrictions: HONEI is actively being developed, and its feature list is continuously expanded. Not all combinations of operations and architectures might be supported in earlier versions of the code. Obtaining snapshots from http://www.honei.org is recommended. Unusual features: The considered applications as well as all library operations can be run on NVIDIA GPUs and the Cell BE. Running time: Depending on the application, and the input sizes. The Poisson solver executes in few seconds, while the SWE solver requires up to 5 minutes for large spatial discretisations or small timesteps. References: [1] http://www.nvidia.com/cuda . [2] http://www.ibm.com/developerworks/power/cell .

Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores | 2015

Hardware-Aware Automatic Code-Transformation to Support Compilers in Exploiting the Multi-Level Parallel Potential of Modern CPUs

Dustin Feld; Thomas Soddemann; Michael Jünger; Sven Mallach

Modern compilers offer more and more capabilities to automatically parallelize code-regions if these match certain properties. However, there are several application kernels that, although rather simple transformations would suffice in order to make them match these properties, are either not at all parallelized by state-of-the-art compilers or could at least be improved w.r.t. their performance. This paper proposes a loop-tiling approach focusing on automatic vectorization and multi-core parallelization, with emphasis on a smart cache exploitation. The method is based on polyhedral code transformations that are applied as a pre-compilation step and it is shown to help compilers in generating more and better parallel code-regions. It automatically adapts to hardware parameters such as the SIMD register width and cache sizes. Further, it takes memory-access patterns into account and is capable to minimize communication among tiles that are to be processed by different cores. An extensive computational study shows significant improvements in the number of instructions vectorized, cache miss rates, and running times for a range of application kernels. The method often outperforms the internal auto-parallelization techniques implemented into gcc and icc.

Facing the multicore-challenge | 2010

Improved scalability by using hardware-aware thread affinities

Sven Mallach; Carsten Gutwenger

The complexity of an efficient thread management steadily rises with the number of processor cores and heterogeneities in the design of system architectures, e.g., the topologies of execution units and the memory architecture. In this paper, we show that using information about the system topology combined with a hardware-aware thread management is worthwhile. We present such a hardware-aware approach that utilizes thread affinity to automatically steer the mapping of threads to cores and experimentally analyze its performance. Our experiments show that we can achieve significantly better scalability and runtime stability compared to the ordinary dispatching of threads provided by the operating system.

graph drawing | 2016

Compact Layered Drawings of General Directed Graphs

Adalat Jabrayilov; Sven Mallach; Petra Mutzel; Ulf Rüegg; Reinhard von Hanxleden

We consider the problem of layering general directed graphs under height and possibly also width constraints. Given a directed graph \(G=(V,A)\) and a maximal height, we propose a layering approach that minimizes a weighted sum of the number of reversed arcs, the arc lengths, and the width of the drawing. We call this the Compact Generalized Layering Problem (CGLP). Here, the width of a drawing is defined as the maximum sum of the number of vertices placed on a layer and the number of dummy vertices caused by arcs traversing the layer. The CGLP is \(\mathcal {NP}\)-hard. We present two MIP models for this problem. The first one (EXT) is our extension of a natural formulation for directed acyclic graphs as suggested by Healy and Nikolov. The second one (CGL) is a new formulation based on partial orderings. Our computational experiments on two benchmark sets show that the CGL formulation can be solved much faster than EXT using standard commercial MIP solvers. Moreover, we suggest a variant of CGL, called MML, that can be seen as a heuristic approach. In our experiments, MML clearly improves on CGL in terms of running time while it does not considerably increase the average arc lengths and widths of the layouts although it solves a slightly different problem where the dummy vertices are not taken into account.

A Quarterly Journal of Operations Research | 2018

Compact linearization for binary quadratic problems subject to assignment constraints

Sven Mallach

We introduce and prove new necessary and sufficient conditions to carry out a compact linearization approach for a general class of binary quadratic problems subject to assignment constraints that has been proposed by Liberti (4OR 5(3):231–245, 2007, https://doi.org/10.1007/s10288-006-0015-3). The new conditions resolve inconsistencies that can occur when the original method is used. We also present a mixed-integer linear program to compute a minimally sized linearization. When all the assignment constraints have non-overlapping variable support, this program is shown to have a totally unimodular constraint matrix. Finally, we give a polynomial-time combinatorial algorithm that is exact in this case and can be used as a heuristic otherwise.

software and compilers for embedded systems | 2013

Solving the simple offset assignment problem as a traveling salesman

Michael Jünger; Sven Mallach

In this paper, we present an exact approach to the Simple Offset Assignment problem arising in the domain of address code generation for digital signal processors. It is based on transformations to weighted Hamiltonian cycle problems and integer linear programming. To the best of our knowledge, it is the rst approach capable to solve all instances of the established OffsetStone benchmark set to optimality within reasonable time. It therefore enables the rst evaluation of the quality of several heuristics relative to the optimum solutions. Further, using the same transformations, we present a novel improvement heuristic that provides a well-tunable trade-off between running time and solution quality.

Journal of Combinatorial Optimization | 2018

Improved mixed-integer programming models for the multiprocessor scheduling problem with communication delays

Sven Mallach

We revise existing and introduce new mixed-integer programming models for the Multiprocessor scheduling problem with communication delays. The basis for both is the identification of two major modeling strategies one of which can be considered ordering-based, and the other assignment-based. We first reveal redundancies in the encoding of feasible solutions found in present formulations and discuss how they can be avoided. For the assignment-based approach, we propose new inequalities that lead to provably stronger continuous relaxations and better performance in practice. Moreover, we derive a third, novel modeling strategy and show how to more compactly linearize assignment formulations with quadratic constraints. In a comprehensive experimental comparison of representative models that reflect the state-of-the-art in terms of strength and size, we evaluate not only running times but also the obtained lower and upper bounds on the makespan for the harder instances of a large scale benchmark set.

international workshop on combinatorial algorithms | 2017

Linear Ordering Based MIP Formulations for the Vertex Separation or Pathwidth Problem

Sven Mallach

We consider the task to compute the pathwidth of a graph which has been shown to be equivalent to the vertex separation problem. The latter is naturally modeled as a linear ordering problem w.r.t. the vertices of the graph. Mixed-integer programs proposed so far express linear orders using either position or set assignment variables. As we show, the lower bound on the pathwidth obtained when solving their linear programming relaxations is zero for any directed graph. We then present a new formulation based on conventional linear ordering variables and a slightly different perspective on the problem that sustains stronger lower bounds. An experimental evaluation of three mixed-integer programs, each representing one of the different modeling schemes, displays their potentials and limitations when used to solve the problem to optimality.

Discrete Optimization | 2016

An integer programming approach to optimal basic block instruction scheduling for single-issue processors

Michael Jünger; Sven Mallach

Abstract We present a novel integer programming formulation for basic block instruction scheduling on single-issue processors. The problem can be considered as a very general sequential task scheduling problem with delayed precedence constraints. Our model is based on the linear ordering problem and has, in contrast to the last IP model proposed, numbers of variables and constraints that are strongly polynomial in the instance size. Combined with improved preprocessing techniques and given a time limit of ten minutes of CPU and system time, our branch-and-cut implementation is capable to solve all but eleven of the 369,861 basic blocks of the SPEC 2000 integer and floating point benchmarks to proven optimality. This is competitive to the current state-of-the art constraint programming approach that has also been evaluated on this test suite.

Explore More