Christian Hundt | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christian Hundt is active.

Explore More

Publication

Featured researches published by Christian Hundt.

european conference on parallel processing | 2015

SAUCE: A Web-Based Automated Assessment Tool for Teaching Parallel Programming

Moritz Schlarb; Christian Hundt; Bertil Schmidt

Many curricula for undergraduate studies in computer science provide a lecture on the fundamentals of parallel programming like multi-threaded computation on shared memory architectures using POSIX threads or OpenMP. The complex structure of parallel programs can be challenging, especially for inexperienced students. Thus, there is a latent need for software supporting the learning process. Subsequent lectures may cover more advanced parallelization techniques such as the Message Passing Interface (MPI) and the Compute Unified Device Architecture (CUDA) languages. Unfortunately, the majority of students cannot easily access MPI clusters or modern hardware accelerators in order to effectively develop parallel programming skills. To overcome this, we present an interactive tool to aid both educators and students in the learning process. This paper describes the “System for AUtomated Code Evaluation” (SAUCE), a web-based open source (available under the AGPL-3.0 license at https://github.com/moschlar/SAUCE) application for programming assignment evaluation and elaborates on its features specifically designed for the teaching of parallel programming. This tool enables educators to provide the required programming environments with a low barrier to entry since it is usable with just a web browser. SAUCE allows for immediate feedback and thus can be used interactively in class room settings.

brazilian conference on intelligent systems | 2014

CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data

Christian Hundt; Bertil Schmidt; Elmar Schömer

Euclidean Distance (ED) and Dynamic Time Warping (DTW) are cornerstones in the field of time series data mining. Many high-level algorithms like kNN-classification, clustering or anomaly detection make excessive use of these distance measures as subroutines. Furthermore, the vast growth of recorded data produced by automated monitoring systems or integrated sensors establishes the need for efficient implementations. In this paper, we introduce linear memory parallelization schemes for the alignment of a given query Q in a stream of time series data S for both ED and DTW using CUDA-enabled accelerators. The ED parallelization features a log-linear calculation scheme in contrast to the naive implementation with quadratic time complexity which allows for more efficient processing of long queries. The DTW implementation makes extensive use of a lower-bound cascade to avoid expensive calculations for unpromising candidates. Our CUDA-parallelizations for both ED and DTW outperform state-of-the-art algorithms, namely the UCR-Suite. The gained speedups range from one to two orders-of-magnitude which allows for significantly faster processing of exceedingly bigger data streams.

Journal of Computational Science | 2017

parSRA: A framework for the parallel execution of short read aligners on compute clusters

Jorge González-Domínguez; Christian Hundt; Bertil Schmidt

The growth of next generation sequencing datasets poses as a challenge to the alignment of reads to reference genomes in terms of both accuracy and speed. In this work we present parSRA, a parallel framework to accelerate the execution of existing short read aligners on distributed-memory systems. parSRA can be used to parallelize a variety of short read alignment tools installed in the system without any modification to their source code. We show that our framework provides good scalability on a compute cluster for accelerating the popular BWA-MEM and Bowtie2 aligners. On average, it is able to accelerate sequence alignments on 16 64-core nodes (in total, 1024 cores) with speedup of 10.48 compared to the original multithreaded tools running with 64 threads on one node. It is also faster and more scalable than the pMap and BigBWA frameworks. Source code of parSRA in C++ and UPC++ running on Linux systems with support for FUSE is freely available at https://sourceforge.net/projects/parsra/.

Jacc-cardiovascular Imaging | 2018

ν-net: Deep Learning for Generalized Biventricular Mass and Function Parameters Using Multicenter Cardiac MRI Data

Hinrich B. Winther; Christian Hundt; Bertil Schmidt; Christoph Czerner; Johann Bauersachs; Frank Wacker; Jens Vogel-Claussen

To the Editor: Cardiac magnetic resonance imaging–derived biventricular mass and function parameters, such as end-systolic volume, end-diastolic volume, ejection fraction, stroke volume (SV), and ventricular mass, are clinically well established. Image segmentation can be challenging and time-

Cluster Computing | 2017

Speed and accuracy improvement of higher-order epistasis detection on CUDA-enabled GPUs

Daniel Jünger; Christian Hundt; Jorge González Domínguez; Bertil Schmidt

The discovery of higher-order epistatic interactions is an important task in the field of genome wide association studies which allows for the identification of complex interaction patterns between multiple genetic markers. Some existing bruteforce approaches explore the whole space of k-interactions in an exhaustive manner resulting in almost intractable execution times. Computational cost can be reduced drastically by restricting the search space with suitable preprocessing filters which prune unpromising candidates. Other approaches mitigate the execution time by employing massively parallel accelerators in order to benefit from the vast computational resources of these architectures. In this paper, we combine a novel preprocessing filter, namely SingleMI, with massively parallel computation on modern GPUs to further accelerate epistasis discovery. Our implementation improves both the runtime and accuracy when compared to a previous GPU counterpart that employs mutual information clustering for prefiltering. SingleMI is open source software and publicly available at: https://github.com/sleeepyjack/singlemi/.

BMC Bioinformatics | 2017

Accelerating metagenomic read classification on CUDA-enabled GPUs.

Robin Kobus; Christian Hundt; André Müller; Bertil Schmidt

BackgroundMetagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed.ResultsWe present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method. Using the processing power of a single Titan X GPU, cuCLARK can reach classification speeds of up to 50 million reads per minute. Corresponding speedups for species- (genus-)level classification range between 3.2 and 6.6 (3.7 and 6.4) compared to multi-threaded CLARK executed on a 16-core Xeon CPU workstation.ConclusioncuCLARK can perform metagenomic read classification at superior speeds on CUDA-enabled GPUs. It is free software licensed under GPL and can be downloaded at https://github.com/funatiq/cuclark free of charge.

Journal of Parallel and Distributed Computing | 2017

SAUCE: A web application for interactive teaching and learning of parallel programming

Christian Hundt; Moritz Schlarb; Bertil Schmidt

Abstract Prevalent hardware trends towards parallel architectures and algorithms create a growing demand for graduate students familiar with the programming of concurrent software. However, learning parallel programming is challenging due to complex communication and memory access patterns as well as the avoidance of common pitfalls such as dead-locks and race conditions. Hence, the learning process has to be supported by adequate software solutions in order to enable future computer scientists and engineers to write robust and efficient code. This paper discusses a selection of well-known parallel algorithms based on C++11 threads, OpenMP, MPI, and CUDA that can be interactively embedded in an HPC or parallel computing lecture using a unified framework for the automated evaluation of source code—namely the “System for AUtomated Code Evaluation” (SAUCE). SAUCE is free software licensed under AGPL-3.0 and can be downloaded at https://github.com/moschlar/SAUCE free of charge.

Bioinformatics | 2017

MetaCache: context-aware classification of metagenomic reads using minhashing

André Müller; Christian Hundt; Andreas Hildebrandt; Thomas Hankeln; Bertil Schmidt

Motivation Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. Results We introduce MetaCache-a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCaches database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data. Availability and implementation MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

european conference on parallel processing | 2016

Ultra-Fast Detection of Higher-Order Epistatic Interactions on GPUs

Daniel Jünger; Christian Hundt; Jorge González-Domínguez; Bertil Schmidt

Detecting higher-order epistatic interactions in Genome-Wide Association Studies (GWAS) remains a challenging task in the fields of genetic epidemiology and computer science. A number of algorithms have recently been proposed for epistasis discovery. However, they suffer from a high computational cost since statistical measures have to be evaluated for each possible combination of markers. Hence, many algorithms use additional filtering stages discarding potentially non-interacting markers in order to reduce the overall number of combinations to be examined. Among others, Mutual Information Clustering (MIC) is a common pre-processing filter for grouping markers into partitions using K-Means clustering. Potentially interacting candidates for high-order epistasis are then examined exhaustively in a subsequent phase. However, analyzing real-world datasets of moderate size can still take several hours when performing analysis on a single CPU. In this work we propose a massively parallel computation scheme for the MIC algorithm targeting CUDA-enabled accelerators. Our implementation is able to perform epistasis discovery using more than 500,000 markers in just a couple of seconds in contrast to several hours when using the sequential MIC implementation. This runtime reduction by two orders-of-magnitude enables fast exploration of higher-order epistatic interactions even in large-scale GWAS datasets.

BMC Bioinformatics | 2016

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs

Christian Hundt; Andreas Hildebrandt; Bertil Schmidt

BackgroundGene Set Enrichment Analysis (GSEA) is a popular method to reveal significant dependencies between predefined sets of gene symbols and observed phenotypes by evaluating the deviation of gene expression values between cases and controls. An established measure of inter-class deviation, the enrichment score, is usually computed using a weighted running sum statistic over the whole set of gene symbols. Due to the lack of analytic expressions the significance of enrichment scores is determined using a non-parametric estimation of their null distribution by permuting the phenotype labels of the probed patients. Accordingly, GSEA is a time-consuming task due to the large number of required permutations to accurately estimate the nominal p-value – a circumstance that is even more pronounced during multiple hypothesis testing since its estimate is lower-bounded by the inverse number of samples in permutation space.ResultsWe present rapidGSEA – a software suite consisting of two tools for facilitating permutation-based GSEA: cudaGSEA and ompGSEA. cudaGSEA is a CUDA-accelerated tool using fine-grained parallelization schemes on massively parallel architectures while ompGSEA is a coarse-grained multi-threaded tool for multi-core CPUs. Nominal p-value estimation of 4,725 gene sets on a data set consisting of 20,639 unique gene symbols and 200 patients (183 cases + 17 controls) each probing one million permutations takes 19 hours on a Xeon CPU and less than one hour on a GeForce Titan X GPU while the established GSEA tool from the Broad Institute (broadGSEA) takes roughly 13 days.ConclusioncudaGSEA outperforms broadGSEA by around two orders-of-magnitude on a single Tesla K40c or GeForce Titan X GPU. ompGSEA provides around one order-of-magnitude speedup to broadGSEA on a standard Xeon CPU. The rapidGSEA suite is open-source software and can be downloaded at https://github.com/gravitino/cudaGSEAas standalone application or package for the R framework.

Explore More