Anjul Patney | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anjul Patney is active.

Explore More

Publication

Featured researches published by Anjul Patney.

international conference on supercomputing | 2008

Efficient computation of sum-products on GPUs through software-managed cache

Mark Silberstein; Assaf Schuster; Dan Geiger; Anjul Patney; John D. Owens

We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the implementation of a software-managed cache. We also present an analytical model for performance analysis of such algorithms. We apply this technique to the implementation of the GPU-based solver of the sum-product or marginalize a product of functions (MPF) problem, which arises in a wide variety of real-life applications in artificial intelligence, statistics, image processing, and digital communications. Our motivation to accelerate MPF originated in the context of the analysis of genetic diseases, which in some cases requires years to complete on modern CPUs. Computing MPF is similar to computing the chain matrix product of multi-dimensional matrices, but is more difficult due to a complex data-dependent access pattern, high data reuse, and a low compute-to-memory access ratio. Our GPU-based MPF solver achieves up to 2700-fold speedup on random data and 270-fold on real-life genetic analysis datasets on GeForce 8800GTX GPU from NVIDIA over the optimized CPU version on an Intel 2.4GHz Core 2 with a 4MB L2 cache.

international conference on computer graphics and interactive techniques | 2011

Efficient maximal poisson-disk sampling

Mohamed S. Ebeida; Andrew A. Davidson; Anjul Patney; Patrick M. Knupp; Scott A. Mitchell; John D. Owens

We solve the problem of generating a uniform Poisson-disk sampling that is both maximal and unbiased over bounded non-convex domains. To our knowledge this is the first provably correct algorithm with time and space dependent only on the number of points produced. Our method has two phases, both based on classical dart-throwing. The first phase uses a background grid of square cells to rapidly create an unbiased, near-maximal covering of the domain. The second phase completes the maximal covering by calculating the connected components of the remaining uncovered voids, and by using their geometry to efficiently place unbiased samples that cover them. The second phase converges quickly, overcoming a common difficulty in dart-throwing methods. The deterministic memory is O(n) and the expected running time is O(n log n), where n is the output size, the number of points in the final sample. Our serial implementation verifies that the log n dependence is minor, and nearly O(n) performance for both time and memory is achieved in practice. We also present a parallel implementation on GPUs to demonstrate the parallel-friendly nature of our method, which achieves 2.4x the performance of our serial version.

Computer Graphics Forum | 2012

A Simple Algorithm for Maximal Poisson-Disk Sampling in High Dimensions

Mohamed S. Ebeida; Scott A. Mitchell; Anjul Patney; Andrew A. Davidson; John D. Owens

We provide a simple algorithm and data structures for d‐dimensional unbiased maximal Poisson‐disk sampling. We use an order of magnitude less memory and time than the alternatives. Our results become more favorable as the dimension increases. This allows us to produce bigger samplings. Domains may be non‐convex with holes. The generated point cloud is maximal up to round‐off error. The serial algorithm is provably bias‐free. For an output sampling of size n in fixed dimension d, we use a linear memory budget and empirical θ(n) runtime. No known methods scale well with dimension, due to the “curse of dimensionality.” The serial algorithm is practical in dimensions up to 5, and has been demonstrated in 6d. We have efficient GPU implementations in 2d and 3d. The algorithm proceeds through a finite sequence of uniform grids. The grids guide the dart throwing and track the remaining disk‐free area. The top‐level grid provides an efficient way to test if a candidate dart is disk‐free. Our uniform grids are like quadtrees, except we delay splits and refine all leaves at once. Since the quadtree is flat it can be represented using very little memory: we just need the indices of the active leaves and a global level. Also it is very simple to sample from leaves with uniform probability.

international conference on computer graphics and interactive techniques | 2008

Real-time Reyes-style adaptive surface subdivision

Anjul Patney; John D. Owens

We present a GPU based implementation of Reyes-style adaptive surface subdivision, known in Reyes terminology as the Bound/Split and Dice stages. The performance of this task is important for the Reyes pipeline to map efficiently to graphics hardware, but its recursive nature and irregular and unbounded memory requirements present a challenge to an efficient implementation. Our solution begins by characterizing Reyes subdivision as a work queue with irregular computation, targeted to a massively parallel GPU. We propose efficient solutions to these general problems by casting our solution in terms of the fundamental primitives of prefix-sum and reduction, often encountered in parallel and GPGPU environments. Our results indicate that real-time Reyes subdivision can indeed be obtained on todays GPUs. We are able to subdivide a complex model to subpixel accuracy within 15 ms. Our measured performance is several times better than that of Pixars RenderMan. Our implementation scales well with the input size and depth of subdivision. We also address concerns of memory size and bandwidth, and analyze the feasibility of conventional ideas on screen-space buckets.

international conference on computer graphics and interactive techniques | 2016

Towards foveated rendering for gaze-tracked virtual reality

Anjul Patney; Marco Salvi; Joohwan Kim; Anton S. Kaplanyan; Chris Wyman; Nir Benty; David Luebke; Aaron E. Lefohn

Foveated rendering synthesizes images with progressively less detail outside the eye fixation region, potentially unlocking significant speedups for wide field-of-view displays, such as head mounted displays, where target framerate and resolution is increasing faster than the performance of traditional real-time renderers. To study and improve potential gains, we designed a foveated rendering user study to evaluate the perceptual abilities of human peripheral vision when viewing todays displays. We determined that filtering peripheral regions reduces contrast, inducing a sense of tunnel vision. When applying a postprocess contrast enhancement, subjects tolerated up to 2× larger blur radius before detecting differences from a non-foveated ground truth. After verifying these insights on both desktop and head mounted displays augmented with high-speed gaze-tracking, we designed a perceptual target image to strive for when engineering a production foveated renderer. Given our perceptual target, we designed a practical foveated rendering system that reduces number of shades by up to 70% and allows coarsened shading up to 30° closer to the fovea than Guenter et al. [2012] without introducing perceivable aliasing or blur. We filter both pre- and post-shading to address aliasing from undersampling in the periphery, introduce a novel multiresolution- and saccade-aware temporal antialising algorithm, and use contrast enhancement to help recover peripheral details that are resolvable by our eye but degraded by filtering. We validate our system by performing another user study. Frequency analysis shows our system closely matches our perceptual target. Measurements of temporal stability show we obtain quality similar to temporally filtered non-foveated renderings.

high performance graphics | 2009

Parallel view-dependent tessellation of Catmull-Clark subdivision surfaces

Anjul Patney; Mohamed S. Ebeida; John D. Owens

We present a strategy for performing view-adaptive, crack-free tessellation of Catmull-Clark subdivision surfaces entirely on programmable graphics hardware. Our scheme extends the concept of breadth-first subdivision, which up to this point has only been applied to parametric patches. While mesh representations designed for a CPU often involve pointer-based structures and irregular perelement storage, neither of these is well-suited to GPU execution. To solve this problem, we use a simple yet effective data structure for representing a subdivision mesh, and design a careful algorithm to update the mesh in a completely parallel manner. We demonstrate that in spite of the complexities of the subdivision procedure, real-time tessellation to pixel-sized primitives can be done. Our implementation does not rely on any approximation of the limit surface, and avoids both subdivision cracks and T-junctions in the subdivided mesh. Using the approach in this paper, we are able to perform real-time subdivision for several static as well as animated models. Rendering performance is scalable for increasingly complex models.

Computer-aided Design | 2011

Efficient and good Delaunay meshes from random points

Mohamed S. Ebeida; Scott A. Mitchell; Andrew A. Davidson; Anjul Patney; Patrick M. Knupp; John D. Owens

We present a Conforming Delaunay Triangulation (CDT) algorithm based on maximal Poisson disk sampling. Points are unbiased, meaning the probability of introducing a vertex in a disk-free subregion is proportional to its area, except in a neighborhood of the domain boundary. In contrast, Delaunay refinement CDT algorithms place points dependent on the geometry of empty circles in intermediate triangulations, usually near the circle centers. Unconstrained angles in our mesh are between 30? and 120?, matching some biased CDT methods. Points are placed on the boundary using a one-dimensional maximal Poisson disk sampling. Any triangulation method producing angles bounded away from 0? and 180? must have some bias near the domain boundary to avoid placing vertices infinitesimally close to the boundary.Random meshes are preferred for some simulations, such as fracture simulations where cracks must follow mesh edges, because deterministic meshes may introduce non-physical phenomena. An ensemble of random meshes aids simulation validation. Poisson-disk triangulations also avoid some graphics rendering artifacts, and have the blue-noise property.We mesh two-dimensional domains that may be non-convex with holes, required points, and multiple regions in contact. Our algorithm is also fast and uses little memory. We have recently developed a method for generating a maximal Poisson distribution of n output points, where n = ? ( Area / r 2 ) and r is the sampling radius. It takes O ( n ) memory and O ( n log n ) expected time; in practice the time is nearly linear. This, or a similar subroutine, generates our random points. Except for this subroutine, we provably use O ( n ) time and space. The subroutine gives the location of points in a square background mesh. Given this, the neighborhood of each point can be meshed independently in constant time. These features facilitate parallel and GPU implementations. Our implementation works well in practice as illustrated by several examples and comparison to Triangle. Highlights? Conforming Delaunay triangulation algorithm based on maximal Poisson-disk sampling. ? Angles between 30? and 120?. ? Two-dimensional non-convex domains with holes, planar straight-line graphs. ? O ( n ) space, E ( n log n ) time; efficient in practice. Background squares ensure all computations are local.

international conference on computer graphics and interactive techniques | 2016

Perceptually-based foveated virtual reality

Anjul Patney; Joohwan Kim; Marco Salvi; Anton S. Kaplanyan; Chris Wyman; Nir Benty; Aaron E. Lefohn; David Luebke

Humans have two distinct vision systems: foveal and peripheral vision. Foveal vision is sharp and detailed, while peripheral vision lacks fidelity. The difference in characteristics of the two systems enable recently popular foveated rendering systems, which seek to increase rendering performance by lowering image quality in the periphery. We present a set of perceptually-based methods for improving foveated rendering running on a prototype virtual reality headset with an integrated eye tracker. Foveated rendering has previously been demonstrated in conventional displays, but has recently become an especially attractive prospect in virtual reality (VR) and augmented reality (AR) display settings with a large field-of-view (FOV) and high frame rate requirements. Investigating prior work on foveated rendering, we find that some previous quality-reduction techniques can create objectionable artifacts like temporal instability and contrast loss. Our emerging technologies installation demonstrates these techniques running live in a head-mounted display and we will compare them against our new perceptually-based foveated techniques. Our new foveation techniques enable significant reduction in rendering cost but have no discernible difference in visual quality. We show how such techniques can fulfill these requirements with potentially large reductions in rendering cost.

high performance graphics | 2012

High-quality parallel depth-of-field using line samples

Stanley Tzeng; Anjul Patney; Andrew A. Davidson; Mohamed S. Ebeida; Scott A. Mitchell; John D. Owens

We present a parallel method for rendering high-quality depth-of-field effects using continuous-domain line samples, and demonstrate its high performance on commodity GPUs. Our method runs at interactive rates and has very low noise. Our exploration of the problem carefully considers implementation alternatives, and transforms an originally unbounded storage requirement to a small fixed requirement using heuristics to maintain quality. We also propose a novel blur-dependent level-of-detail scheme that helps accelerate rendering without undesirable artifacts. Our method consistently runs 4 to 5x faster than an equivalent point sampler with better image quality. Our method draws parallels to related work in rendering multi-fragment effects.

eurographics | 2010

Fragment-parallel composite and filter

Anjul Patney; Stanley Tzeng; John D. Owens

We present a strategy for parallelizing the composite and filter operations suitable for an order‐independent rendering pipeline implemented on a modern graphics processor. Conventionally, this task is parallelized across pixels/subpixels, but serialized along individual depth layers. However, our technique extends the domain of parallelization to individual fragments (samples), avoiding a serial dependence on the number of depth layers, which can be a constraint for scenes with high depth complexity. As a result, our technique scales with the number of fragments and can sustain a consistent and predictable throughput in scenes with both low and high depth complexity, including those with a high variability of depth complexity within a single frame. We demonstrate composite/filter performance in excess of 50M fragments/sec for scenes with more than 1500 semi‐transparent layers.

Explore More