Is this you? Create Your Porfile

Timo Bingmann

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Timo Bingmann is active.

Explore More

Publication

Featured researches published by Timo Bingmann.

J3ea | 2016

Inducing Suffix and LCP Arrays in External Memory

Timo Bingmann; Johannes Fischer; Vitaly Osipov

We consider full text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory, which runs in sorting complexity. Practical tests show that this algorithm outperforms the previous best EM suffix sorter [Dementiev et al., JEA 2008] by a factor of about two in time and I/O volume. Our second contribution is to augment the first algorithm to also construct the array of longest common prefixes (LCPs). This yields a new internal memory LCP array construction algorithm and the first EM construction algorithm for LCP arrays. The overhead in time and I/O volume for this extended algorithm over plain suffix array construction is roughly two. Our algorithms scale far beyond problem sizes previously considered in the literature (text size of 80GiB using only 4GiB of RAM in our experiments).

international conference on big data | 2016

Thrill: High-performance algorithmic distributed batch data processing with C++

Timo Bingmann; Michael Axtmann; Emanuel Jöbstl; Sebastian Lamm; Huyen Chau Nguyen; Alexander Noe; Sebastian Schlag; Matthias Stumpp; Tobias Sturm; Peter Sanders

We present the design and a first performance evaluation of Thrill — a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cache-friendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping). We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction.

Algorithmica | 2017

Engineering Parallel String Sorting

Timo Bingmann; Andreas Eberle; Peter Sanders

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. As base-case sorter for LCP-aware string sorting we describe sequential LCP-insertion sort which calculates the LCP array and accelerates its insertions using it. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our parallel string sorting implementations scale very well on real-world inputs and modern machines.

acm symposium on parallel algorithms and architectures | 2015

Practical Massively Parallel Sorting

Michael Axtmann; Timo Bingmann; Peter Sanders; Christian Schulz

Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and overcome this gap both in theory and practice. The algorithms are multi-level generalizations of the known algorithms sample sort and multiway mergesort. In particular, our sample sort variant turns out to be very scalable both in theory and practice where it scales up to 215 MPI processes with outstanding performance in particular for medium sized inputs. Some tools we develop may be of independent interest -- a simple, practical, and flexible sorting algorithm for very small inputs, a near linear time ptimal algorithm for solving a constrained bin packing problem, and an algorithm for data delivery, that guarantees a small number of message startups on each processor.

european symposium on algorithms | 2013

Parallel String Sample Sort

Timo Bingmann; Peter Sanders

arXiv: Data Structures and Algorithms | 2018

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

Timo Bingmann

This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using the induced sorting principle, and the third part distributed external memory suffix sorting with a new distributed algorithmic big data framework named Thrill. Sorting strings or vectors is a basic algorithmic challenge different from integer sorting because it is important to access components of the keys to avoid repeated operations on the entire string. We focus on sorting large inputs which fit into the RAM of a shared-memory machine. String sorting is needed for instance in database index construction, suffix sorting algorithms, and to order high-dimensional geometric data. We first survey engineered variants of basic sequential string sorting algorithms and perform an extensive experimental evaluation to measure their performance. Furthermore, we perform experiments to quantify parallel memory bandwidth and latency experiments as preliminary work for designing parallel string sorting algorithms. We then propose string sample sort as an adaptation of sample sort to string objects and present its engineered version Super Scalar String Sample Sort. This parallel-ready algorithm runs in O(D/w + n log n) expected time, makes effective use of the cache hierarchy, uses word- and instruction-level parallelism, and avoids branch mispredictions. Our parallelization named Parallel Super Scalar String Sample Sort (pS5) employs voluntary work sharing for load balancing and is the overall best performing algorithm on single-socket multi-core machines in our experiments. For platforms with non-uniform memory access (NUMA) we propose to run pS5 on each NUMA node independently and then merge the sorted string sequences. To accelerate the merge with longest common prefix (LCP) values we present a new LCP-aware multiway merge algorithm using a tournament tree. The merge algorithm is also used to construct a stand-alone LCP-aware K-way mergesort, which runs in O(D + n log n + n/K) time and benefits from long common prefixes in the input. Broadly speaking, we propose both multiway distribution-based with string sample sort and multiway merge-based string sorting with LCP-aware merge and mergesort, and engineer and parallelize both approaches. We also present parallelizations of multikey quicksort and radix sort, and perform an extensive experimental evaluation using six machines and seven inputs. For all input instances, except random strings and URLs, pS5 achieves higher speedups on modern single-socket multi-core machines than our own parallel multikey quicksort and radix sort implementations, which are already better than any previous ones. On multi-socket NUMA machines pS5 combined with the LCP-aware top-level multiway merging was fastest on most inputs. We then turn our focus to suffix sorting, which is equivalent to suffix array construction. The suffix array is one of the most popular text indexes and can be used for fast substring search in DNA or text corpora, in compression applications, and is the basis for many string algorithms. When augmented with the LCP array and additional tables, the suffix array can emulate the suffix tree in a myriad of stringology algorithms. Our goal is to create fast and scalable suffix sorting algorithms to generate large suffix arrays for real-world inputs. As introduction to suffix array construction, we first present a brief survey of their principles and history. Our initial contribution to this field is eSAIS, the first external memory suffix sorting algorithm which uses the induced sorting principle. Its central loop is an elegant reformulation of this principle using an external memory priority queue, and our theoretical analysis shows that eSAIS requires at most Sort(17n) + Scan(9n) I/O volume. We then extend eSAIS to also construct the LCP array while suffix sorting, which yields the first implementation of fully external memory suffix and LCP array construction in the literature. Our experiments demonstrate that eSAIS is a factor two faster than DC3, the previously best external memory suffix sorting implementation. After our initial publication of eSAIS, many authors showed interest in the topic and we review their contributions and improvements over eSAIS. For scaling to even larger inputs, we then consider suffix sorting on a distributed cluster machine. To harness the computational power of a such a system in a convenient data-flow style functional programming paradigm, we propose the new high-performance distributed big data processing framework Thrill. Thrill’s central concept is a distributed immutable array (DIA), which is a virtual array of C++ objects distributed onto the cluster. Such arrays can be manipulated using a small set of scalable primitives, such as mapping, reducing, and sorting. These are implemented using pipelined distributed external memory algorithms encapsulated as C++ template classes, which can be efficiently coupled to form large complex applications. Our Thrill prototype is evaluated using five micro benchmarks against the popular frameworks Apache Spark and Flink on up to 16 hosts in the AWS Elastic Compute Cloud. Thrill consistently outperforms the other frameworks in all benchmarks and on all numbers of hosts. Using Thrill we then implement five suffix sorting algorithms as a case study. Three are based on prefix doubling and two are variants of the linear-time difference cover algorithm DC. The implementation of these complex algorithms demonstrates the expressiveness of the scalable primitives provided by Thrill. They also are the first distributed external memory suffix sorters presented in the literature. We compare them experimentally against two hand-coded MPI implementations and the fastest non-distributed sequential suffix sorters. Our results show that algorithms implemented using Thrill are competitive to MPI programs, but scale to larger inputs due to automatic usage of external memory. In the future, these implementations can benefit from improvements of Thrill such as fault tolerance or specialized sorting algorithms.

algorithm engineering and experimentation | 2013