Carsten Benthin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carsten Benthin is active.

Explore More

Publication

Featured researches published by Carsten Benthin.

international conference on computer graphics and interactive techniques | 2014

Embree: a kernel framework for efficient CPU ray tracing

Ingo Wald; Sven Woop; Carsten Benthin; Gregory S. Johnson; Manfred Ernst

We describe Embree, an open source ray tracing framework for x86 CPUs. Embree is explicitly designed to achieve high performance in professional rendering environments in which complex geometry and incoherent ray distributions are common. Embree consists of a set of low-level kernels that maximize utilization of modern CPU architectures, and an API which enables these kernels to be used in existing renderers with minimal programmer effort. In this paper, we describe the design goals and software architecture of Embree, and show that for secondary rays in particular, the performance of Embree is competitive with (and often higher than) existing state-of-the-art methods on CPUs and GPUs.

2008 IEEE Symposium on Interactive Ray Tracing | 2008

Getting rid of packets - Efficient SIMD single-ray traversal using multi-branching BVHs -

Ingo Wald; Carsten Benthin; Solomon Boulos

While contemporary approaches to SIMD ray tracing typically rely on traversing packets of coherent rays through a binary data structure, we instead evaluate the alternative of traversing individual rays through a bounding volume hierarchy with a branching factor of 16. Though obviously less efficient than high-performance packet techniques for primary rays, we demonstrate that for less coherent secondary ray distributions this approach is at least competitive with (and often faster than) typical packet traversal techniques.

IEEE Transactions on Visualization and Computer Graphics | 2012

Combining Single and Packet-Ray Tracing for Arbitrary Ray Distributions on the Intel MIC Architecture

Carsten Benthin; Ingo Wald; Sven Woop; Manfred Ernst; William R. Mark

Wide-SIMD hardware is power and area efficient, but it is challenging to efficiently map ray tracing algorithms to such hardware especially when the rays are incoherent. The two most commonly used schemes are either packet tracing, or relying on a separate traversal stack for each SIMD lane. Both work great for coherent rays, but suffer when rays are incoherent: The former experiences a dramatic loss of SIMD utilization once rays diverge; the latter requires a large local storage, and generates multiple incoherent streams of memory accesses that present challenges for the memory system. In this paper, we introduce a single-ray tracing scheme for incoherent rays that uses just one traversal stack on 16-wide SIMD hardware. It uses a bounding-volume hierarchy with a branching factor of four as the acceleration structure, exploits four-wide SIMD in each box and primitive intersection test, and uses 16-wide SIMD by always performing four such node or primitive tests in parallel. We then extend this scheme to a hybrid tracing scheme that automatically adapts to varying ray coherence by starting out with a 16-wide packet scheme and switching to the new single-ray scheme as soon as rays diverge. We show that on the Intel Many Integrated Core architecture this hybrid scheme consistently, and over a wide range of scenes and ray distributions, outperforms both packet and single-ray tracing.

2008 IEEE Symposium on Interactive Ray Tracing | 2008

Adaptive ray packet reordering

Solomon Boulos; Ingo Wald; Carsten Benthin

Modern high-performance ray tracers use large ray packets and SIMD instruction sets to decrease both the computational and bandwidth cost compared to a single ray implementation. Current global illumination renderers, however, are still based around single ray implementations and interfaces. The presumption is that while packets have been shown to work well for highly coherent rays, in the presence of less coherent secondary ray distributions the gains of both packet and SIMD techniques dwindle rapidly. With low enough coherence, performance can be reduced to being as slow as reasonable single ray code - if not worse - so the benefit of packets for a global illumination system is assumed to be next to none. With SIMD width expanding in future architectures, leaving SIMD units underutilized means a massive loss in performance compared to the maximum performance achievable. In this paper, we present a method for recovering packet and SIMD coherence for incoherent secondary ray distributions through demand-driven reordering of rays into more coherent packets. We demonstrate that the reordering overhead is outweighed by the increased coherence within a prototypical implementation in the Manta realtime ray tracer among a wide variety of ray distributions, including diffuse path tracing.

international conference on computer graphics and interactive techniques | 2013

Embree ray tracing kernels for CPUs and the Xeon Phi architecture

Sven Woop; Louis Feng; Ingo Wald; Carsten Benthin

Modern CPUs achieve high computational throughput by implementing increasingly wide SIMD vector units (such as 8-wide AVX or 16-wide SIMD for the Xeon Phi instructions). Achieving optimal performance on these architectures requires leveraging these wide SIMD vector units effectively. We present Embree [Ernst and Woop 2011], an open source ray tracing library developed to show performance-focused graphics programmers how to take full advantage of multiple cores and wide SIMD units in the context of ray tracing. Embree features spatial acceleration structures and traversal algorithms that are optimized for CPUs and the Intel Xeon Phi architecture. In particular, Embree supports hybrid ray packet/single ray traversal algorithms---optimized for both CPUs and Xeon Phi---that are designed to handle both coherent and incoherent workloads efficiently [Benthin et al. 2012]. While a first version of Embree originally focused only on single ray traversal on SSE- or AVX-enabled CPUs, this talk specifically covers the upcoming Embree 2.0 release that explicitly also supports the Xeon Phi architecture, adds support for packet tracing, two level hierarchies, partial scene updates, dynamic content, and virtual intersectors for user defined primitives.

high performance graphics | 2009

Efficient ray traced soft shadows using multi-frusta tracing

Carsten Benthin; Ingo Wald

Ray tracing has long been considered to be superior to rasterization because its ability to trace arbitrary rays, allowing it to simulate virtually any physical light transport effect by just tracing rays. Yet, to look plausible, extraordinary amounts of rays for effects such as soft shadows are typically required. This makes the prospects of real-time performance rather remote. Rasterization, in contrast, has a record of producing such effects in real-time through employing specialized and approximate solutions for individual effects. Though ray tracing may still be the right choice for effects like reflections and refractions, using specialized solutions for certain important effects also makes sense for a ray tracer. In this paper, we propose a special solution to ray trace soft shadows that is particularly targeted for Intels Larrabee architecture. We use a specialized frustum tracing that traces multiple frusta of specialized light-weight shadow packets in parallel, while generating rays within each frustum on demand. The technique can easily be integrated into any packet ray tracer, and fits well into the wide SIMD and cache-size constraints of the Larrabee architecture. Our technique allows to reach rates of up to several dozen million rays per second per Larrabee core, outperforming traditional packet techniques by up to 6x. This high performance combined with a simple light-weight illumination filtering step allows to achieve real-time soft shadows for game-like scenes.

high performance graphics | 2014

Exploiting local orientation similarity for efficient ray traversal of hair and fur

Sven Woop; Carsten Benthin; Ingo Wald; Gregory S. Johnson; Eric Tabellion

Hair and fur typically consist of a large number of thin, curved, and densely packed strands which are difficult to ray trace efficiently. A tight fitting spatial data structure, such as a bounding volume hierarchy (BVH), is needed to quickly determine which hair a ray hits. However, the large number of hairs can yield a BVH with a large memory footprint (particularly when hairs are pre-tessellated), and curved or diagonal hairs cannot be tightly bounded within axis aligned bounding boxes. In this paper, we describe an approach to ray tracing hair and fur with improved efficiency, by combining parametrically defined hairs with a BVH that uses both axis-aligned and oriented bounding boxes. This BVH exploits similarity in the orientation of neighboring hairs to increase ray culling efficiency compared to purely axis-aligned BVHs. Our approach achieves about 2x the performance of ray tracing pre-tessellated hair models, while requiring significantly less memory.

high performance graphics | 2015

Efficient ray tracing of subdivision surfaces using tessellation caching

Carsten Benthin; Sven Woop; Matthias Nießner; Kai Selgrad; Ingo Wald

A common way to ray trace subdivision surfaces is by constructing and traversing spatial hierarchies on top of tessellated input primitives. Unfortunately, tessellating surfaces requires a substantial amount of memory storage, and involves significant construction and memory I/O costs. In this paper, we propose a lazy-build caching scheme to efficiently handle these problems while also exploiting the capabilities of todays many-core architectures. To this end, we lazily tessellate patches only when necessary, and utilize adaptive subdivision to efficiently evaluate the underlying surface representation. The core idea of our approach is a shared lazy evaluation cache, which triggers and maintains the surface tessellation. We combine our caching scheme with SIMD-optimized subdivision primitive evaluation and fast hierarchy construction over the tessellated surface. This allows us to achieve high ray tracing performance in complex scenes, outperforming the state of the art while requiring only a fraction of the memory. In addition, our method stays within a fixed memory budget regardless of the tessellation level, which is essential for many applications such as movie production rendering. Beyond the results of this paper, we have integrated our method into Embree, an open source ray tracing framework, thus making interactive ray tracing of subdivision surfaces publicly available.

high performance graphics | 2016

Local shading coherence extraction for SIMD-efficient path tracing on CPUs

Attila T. Áfra; Carsten Benthin; Ingo Wald; Jacob Munkberg

Accelerating ray traversal on data-parallel hardware architectures has received widespread attention over the last few years, but much less research has focused on efficient shading for ray tracing. This is unfortunate since shading for many applications is the single most time consuming operation. To maximize rendering performance, it is therefore crucial to effectively use the processors wide vector units not only for the ray traversal step itself, but also during shading. This is non-trivial as incoherent ray distributions cause control flow divergence, making high SIMD utilization difficult to maintain. In this paper, we propose a local shading coherence extraction algorithm for CPU-based path tracing that enables efficient SIMD shading. Each core independently traces and sorts small streams of rays that fit into the on-chip cache hierarchy, allowing to extract coherent ray batches requiring similar shading operations, with a very low overhead. We show that operating on small independent ray streams instead of a large global stream is sufficient to achieve high SIMD utilization in shading (90% on average) for complex scenes, while avoiding unnecessary memory traffic and synchronization. For a set of scenes with many different materials, our approach reduces the shading time with 1.9-3.4x compared to simple structure-of-arrays (SoA) based packet shading. The total rendering speedup varies between 1.2-3×, which is also determined by the ratio of the traversal and shading times.

high performance graphics | 2017

Improved two-level BVHs using partial re-braiding

Carsten Benthin; Sven Woop; Ingo Wald; Attila T. Áfra

We propose a novel approach for improving the quality of two-level BVHs (i.e., a two-level data structure that uses a top-level BVH built over second-level object BVHs). After building an individual, high-quality BVH for each object, our new top-level BVH build approach selectively re-braids (opens and merges) object BVHs during the build process to reduce overlap and improve SAH quality. We demonstrate that compared to the two main state-of-the-art techniques---brute-force re-construction of a single, flat BVH; and building a traditional two-level BVH over objects, respectively---the proposed approach achieves build times significantly faster than the former, while simultaneously yielding traversal performance that is much higher than the latter.

Explore More