Is this you? Create Your Porfile

Nicolas Weber

Technische Universität Darmstadt

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nicolas Weber is active.

Explore More

Publication

Featured researches published by Nicolas Weber.

architectural support for programming languages and operating systems | 2013

Fast dynamic memory allocator for massively parallel architectures

Sven Widmer; Dominik Wodniok; Nicolas Weber; Michael Goesele

Dynamic memory allocation in massively parallel systems often suffers from drastic performance decreases due to the required global synchronization. This is especially true when many allocation or deallocation requests occur in parallel. We propose a method to alleviate this problem by making use of the SIMD parallelism found in most current massively parallel hardware. More specifically, we propose a hybrid dynamic memory allocator operating at the SIMD parallel warp level. Using additional constraints that can be fulfilled for a large class of practically relevant algorithms and hardware systems, we are able to significantly speed-up the dynamic allocation. We present and evaluate a prototypical implementation for modern CUDA-enabled graphics cards, achieving an overall speedup of up to several orders of magnitude.

international conference on computer graphics and interactive techniques | 2016

Rapid, detail-preserving image downscaling

Nicolas Weber; Michael Waechter; Sandra C. Amend; Stefan Guthe; Michael Goesele

Image downscaling is arguably the most frequently used image processing tool. We present an algorithm based on convolutional filters where input pixels contribute more to the output image the more their color deviates from their local neighborhood, which preserves visually important details. In a user study we verify that users prefer our results over related work. Our efficient GPU implementation works in real-time when downscaling images from 24 M to 70 k pixels. Further, we demonstrate empirically that our method can be successfully applied to videos.

eurographics workshop on parallel graphics and visualization | 2014

Auto-tuning complex array layouts for GPUs

Nicolas Weber; Michael Goesele

The continuing evolution of Graphics Processing Units (GPU) has shown rapid performance increases over the years. But with each new hardware generation, the constraints for programming them efficiently have changed. Programs have to be tuned towards one specific hardware to unleash the full potential. This is time consuming and costly as vendors tend to release a new generation every 18 months. It is therefore important to auto-tune GPU code to achieve GPU-specific improvements. Using either static or empirical profiling to adjust parameters or to change the kernel implementation. We introduce a new approach to automatically improve memory access on GPUs. Our system generates an application specific library which abstracts the memory access for complex arrays on the host and GPU side. This allows to optimize the code by exchanging the memory layout without recompiling the application, as all necessary layouts are pre-compiled into the library. Our implementation is able to speedup real-world applications up to an order of magnitude and even outperforms hand-tuned implementations.

ieee international conference on high performance computing data and analytics | 2015

Guided profiling for auto-tuning array layouts on GPUs

Nicolas Weber; Sandra C. Amend; Michael Goesele

Auto-tuning for Graphics Processing Units (GPUs) has become very popular in recent years. It removes the necessity to hand-tune GPU code especially when a new hardware architecture is released. Our auto-tuner optimizes memory access patterns. This is a key aspect to exploit the full performance of modern GPUs. As the memory hierarchy has historically changed in nearly every GPU generation, it was necessary to reoptimize the code for all of these new architectures. Unfortunately, the solution space for memory optimizations in large applications can easily reach millions of configurations for a single kernel. This vast number of implementations cannot be fully evaluated in a feasible time. In this paper we present an adaptive profiling algorithm that aims at finding a near optimal configuration within a fraction of the global optimum, while reducing the profiling time by several orders of magnitude compared to an exhaustive search. Our algorithm is aimed at and evaluated on large real-world applications.

ACM Transactions on Architecture and Code Optimization | 2017

MATOG: Array Layout Auto-Tuning for CUDA

Nicolas Weber; Michael Goesele

Optimal code performance is (besides correctness and accuracy) the most important objective in compute intensive applications. In many of these applications, Graphic Processing Units (GPUs) are used because of their high amount of compute power. However, caused by their massively parallel architecture, the code has to be specifically adjusted to the underlying hardware to achieve optimal performance and therefore has to be reoptimized for each new generation. In reality, this is usually not the case as productive code is normally at least several years old and nobody has the time to continuously adjust existing code to new hardware. In recent years more and more approaches have emerged that automatically tune the performance of applications toward the underlying hardware. In this article, we present the MATOG auto-tuner and its concepts. It abstracts the array memory access in CUDA applications and automatically optimizes the code according to the used GPUs. MATOG only requires few profiling runs to analyze even complex applications, while achieving significant speedups over non-optimized code, independent of the used GPU generation and without the need to manually tune the code.

Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications | 2016

Adaptive GPU Array Layout Auto-Tuning

Nicolas Weber; Michael Goesele

Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware, making them an ideal target for auto-tuning approaches. We present an auto-tuner which optimizes array layouts in CUDA applications. Depending on the data and program parameters, kernels can have varying optimal configurations. We thus adjust array layouts adaptively at runtime and achieve or even exceed performance of hand optimized code. We automatically detect data characteristics to identify different performance scenarios without user input or additional programming. We perform an empirical analysis of the application in order to construct our decision models. Our adaptive optimization requires in principle profiling data for an extremely high number of scenarios which cannot be exhaustively evaluated for complex applications. We solve this by extending a previously published method that is able to efficiently profile single kernel calls and enhance it to find application-wide optimal solutions. Our method is able to optimize applications in a few minutes, reaching speed ups of up to 20% compared to hand optimized code.

Social Science Computer Review | 2018

Prospect for Knowledge in Survey Data: An Artificial Neural Network Sensitivity Analysis

Patrick Weber; Nicolas Weber; Michael Goesele; Rüdiger Kabst

Policy making depends on good knowledge of the corresponding target audience. To maximize the designated outcome, it is essential to understand the underlying coherences. Machine learning techniques are capable of analyzing data containing behavioral aspects, evaluations, attitudes, and social values. We show how existing machine learning techniques can be used to identify behavioral aspects of human decision-making and to predict human behavior. These techniques allow to extract high resolution decision functions that enable to draw conclusions on human behavior. Our focus is on voter turnout, for which we use data acquired by the European Social Survey on the German national vote. We show how to train an artificial expert and how to extract the behavioral aspects to build optimized policies. Our method achieves an increase in adjusted R 2 of 102% compared to a classic logistic regression prediction. We further evaluate the performance of our method compared to other machine learning techniques such as support vector machines and random forests. The results show that it is possible to better understand unknown variable relationships.

computer vision and pattern recognition | 2018