Qiming Hou
Zhejiang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qiming Hou.
international conference on computer graphics and interactive techniques | 2008
Kun Zhou; Qiming Hou; Rui Wang; Baining Guo
We present an algorithm for constructing kd-trees on GPUs. This algorithm achieves real-time performance by exploiting the GPUs streaming architecture at all stages of kd-tree construction. Unlike previous parallel kd-tree algorithms, our method builds tree nodes completely in BFS (breadth-first search) order. We also develop a special strategy for large nodes at upper tree levels so as to further exploit the fine-grained parallelism of GPUs. For these nodes, we parallelize the computation over all geometric primitives instead of nodes at each level. Finally, in order to maintain kd-tree quality, we introduce novel schemes for fast evaluation of node split costs. As far as we know, ours is the first real-time kd-tree algorithm on the GPU. The kd-trees built by our algorithm are of comparable quality as those constructed by off-line CPU algorithms. In terms of speed, our algorithm is significantly faster than well-optimized single-core CPU algorithms and competitive with multi-core CPU algorithms. Our algorithm provides a general way for handling dynamic scenes on the GPU. We demonstrate the potential of our algorithm in applications involving dynamic scenes, including GPU ray tracing, interactive photon mapping, and point cloud modeling.
international conference on computer graphics and interactive techniques | 2014
Chen Cao; Qiming Hou; Kun Zhou
We present a fully automatic approach to real-time facial tracking and animation with a single video camera. Our approach does not need any calibration for each individual user. It learns a generic regressor from public image datasets, which can be applied to any user and arbitrary video cameras to infer accurate 2D facial landmarks as well as the 3D facial shape from 2D video frames. The inferred 2D landmarks are then used to adapt the camera matrix and the user identity to better match the facial expressions of the current user. The regression and adaptation are performed in an alternating manner. With more and more facial expressions observed in the video, the whole process converges quickly with accurate facial tracking and animation. In experiments, our approach demonstrates a level of robustness and accuracy on par with state-of-the-art techniques that require a time-consuming calibration step for each individual user, while running at 28 fps on average. We consider our approach to be an attractive solution for wide deployment in consumer-level applications.
international conference on computer graphics and interactive techniques | 2008
Qiming Hou; Kun Zhou; Baining Guo
We present BSGP, a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write, and maintain. Moreover, the ease of programming does not come at the cost of performance. A well-designed BSGP compiler converts BSGP programs to kernels and combines them using optimally allocated temporary streams. In our benchmark, BSGP programs achieve similar or better performance than well-optimized CUDA programs, while the source code complexity and programming time are significantly reduced. To test BSGPs code efficiency and ease of programming, we implemented a variety of GPU applications, including a highly sophisticated X3D parser that would be extremely difficult to develop with existing GPU programming languages.
international conference on computer graphics and interactive techniques | 2009
Kun Zhou; Qiming Hou; Zhong Ren; Minmin Gong; Xin Sun; Baining Guo
We present RenderAnts, the first system that enables interactive Reyes rendering on GPUs. Taking RenderMan scenes and shaders as input, our system first compiles RenderMan shaders to GPU shaders. Then all stages of the basic Reyes pipeline, including bounding/splitting, dicing, shading, sampling, compositing and filtering, are executed on GPUs using carefully designed data-parallel algorithms. Advanced effects such as shadows, motion blur and depth-of-field can also be rendered. In order to avoid exhausting GPU memory, we introduce a novel dynamic scheduling algorithm to bound the memory consumption during rendering. The algorithm automatically adjusts the amount of data being processed in parallel at each stage so that all data can be maintained in the available GPU memory. This allows our system to maximize the parallelism in all individual stages of the pipeline and achieve superior performance. We also propose a multi-GPU scheduling technique based on work stealing so that the system can support scalable rendering on multiple GPUs. The scheduler is designed to minimize inter-GPU communication and balance workloads among GPUs. We demonstrate the potential of RenderAnts using several complex RenderMan scenes and an open source movie entitled Elephants Dream. Compared to Pixars PRMan, our system can generate images of comparably high quality, but is over one order of magnitude faster. For moderately complex scenes, the system allows the user to change the viewpoint, lights and materials while producing photorealistic results at interactive speed.
IEEE Transactions on Visualization and Computer Graphics | 2011
Qiming Hou; Xin Sun; Kun Zhou; Christian Lauterbach; Dinesh Manocha
Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the breadth-first search (BFS) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, which becomes a serious issue for interactive applications involving very complex models with more than a few million triangles. In this paper, we propose to use the partial breadth-first search (PBFS) construction order to control memory consumption while maximizing performance. We apply the PBFS order to two hierarchy construction algorithms. The first algorithm is for kd-trees that automatically balances between the level of parallelism and intermediate memory usage. With PBFS, peak memory consumption during construction can be efficiently controlled without costly CPU-GPU data transfer. We also develop memory allocation strategies to effectively limit memory fragmentation. The resulting algorithm scales well with GPU memory and constructs kd-trees of models with millions of triangles at interactive rates on GPUs with 1 GB memory. Compared with existing algorithms, our algorithm is an order of magnitude more scalable for a given GPU memory bound. The second algorithm is for out-of-core bounding volume hierarchy (BVH) construction for very large scenes based on the PBFS construction order. At each iteration, all constructed nodes are dumped to the CPU memory, and the GPU memory is freed for the next iterations use. In this way, the algorithm is able to build trees that are too large to be stored in the GPU memory. Experiments show that our algorithm can construct BVHs for scenes with up to 20 M triangles, several times larger than previous GPU algorithms.
pacific conference on computer graphics and applications | 2007
Kun Zhou; Qiming Hou; Minmin Gong; John Snyder; Baining Guo; Heung-Yeung Shum
We present a new, general, and real-time technique for soft global illumination in low-frequency environmental lighting. It accumulates over relatively few spherical proxies which approximate the light blocking and re-radiating effect of dynamic geometry. Soft shadows are computed by accumulating log visibility vectors for each sphere proxy as seen by each receiver point. Inter-reflections are computed by accumulating vectors representing the proxys unshadowed radiance when illuminated by the environment. Both vectors capture low-frequency directional dependence using the spherical harmonic basis. We also present a new proxy accumulation strategy that splats each proxy to receiver pixels in image space to collect its shadowing and indirect lighting contribution. Our soft GI rendering pipeline unifies direct and indirect soft effects with a simple accumulation strategy that maps entirely to the GPU and outperforms previous vertex-based methods.We describe a new, analytic approximation to the airlight integral from scattering media whose density is modeled as a sum of Gaussians. The approximation supports real-time rendering of inhomogeneous media including their shadowing and scattering effects. For each Gaussian, this approximation samples the scattering integrand at the projection of its center along the view ray but models attenuation and shadowing with respect to the other Gaussians by integrating density along the fixed path from light source to 3D center to view point. Our method handles isotropic, single-scattering media illuminated by point light sources or low-frequency lighting environments. We also generalize models for reflectance of surfaces from constant-density to inhomogeneous media, using simple optical depth averaging in the direction of the light source or all around the receiver point. Our real-time renderer is incorporated into a system for real-time design and preview of realistic animated fog, steam, or smoke.
international conference on computer graphics and interactive techniques | 2010
Qiming Hou; Hao Qin; Wenyao Li; Baining Guo; Kun Zhou
We present a micropolygon ray tracing algorithm that is capable of efficiently rendering high quality defocus and motion blur effects. A key component of our algorithm is a BVH (bounding volume hierarchy) based on 4D hyper-trapezoids that project into 3D OBBs (oriented bounding boxes) in spatial dimensions. This acceleration structure is able to provide tight bounding volumes for scene geometries, and is thus efficient in pruning intersection tests during ray traversal. More importantly, it can exploit the natural coherence on the time dimension in motion blurred scenes. The structure can be quickly constructed by utilizing the micropolygon grids generated during micropolygon tessellation. Ray tracing of defocused and motion blurred scenes is efficiently performed by traversing the structure. Both the BVH construction and ray traversal are easily implemented on GPUs and integrated into a GPU-based micropolygon renderer. In our experiments, our ray tracer performs up to an order of magnitude faster than the state-of-art rasterizers while consistently delivering an image quality equivalent to a maximum-quality rasterizer. We also demonstrate that the ray tracing algorithm can be extended to handle a variety of effects, such as secondary ray effects and transparency.
international conference on computer graphics and interactive techniques | 2009
Qiming Hou; Kun Zhou; Baining Guo
We present a novel framework for debugging GPU stream programs through automatic dataflow recording and visualization. Our debugging system can help programmers locate errors that are common in general purpose stream programs but very difficult to debug with existing tools. A stream program is first compiled into an instrumented program using a compiler. This instrumenting compiler automatically adds to the original program dataflow recording code that saves the information of all GPU memory operations into log files. The resulting stream program is then executed on the GPU. With dataflow recording, our debugger automatically detects common memory errors such as out-of-bound access, uninitialized data access, and race conditions -- these errors are extremely difficult to debug with existing tools. When the instrumented program terminates, either normally or due to an error, a dataflow visualizer is launched and it allows the user to examine the memory operation history of all threads and values in all streams. Thus the user can analyze error sources by tracing through relevant threads and streams using the recorded dataflow. A key ingredient of our debugging framework is the GPU interrupt, a novel mechanism that we introduce to support CPU function calls from inside GPU code. We enable interrupts on the GPU by designing a specialized compilation algorithm that translates these interrupts into GPU kernels and CPU management code. Dataflow recording involving disk I/O operations can thus be implemented as interrupt handlers. The GPU interrupt mechanism also allows the programmer to discover errors in more active ways by developing customized debugging functions that can be directly used in GPU code. As examples we show two such functions: assert for data verification and watch for visualizing intermediate results.
IEEE Transactions on Visualization and Computer Graphics | 2011
Xin Sun; Qiming Hou; Zhong Ren; Kun Zhou; Baining Guo
We present a real-time algorithm to render all-frequency radiance transfer at both macroscale and mesoscale. At a mesoscale, the shading is computed on a per-pixel basis by integrating the product of the local incident radiance and a bidirectional texture function. While at a macroscale, the precomputed transfer matrix, which transfers the global incident radiance to the local incident radiance at each vertex, is losslessly compressed by a novel biclustering technique. The biclustering is directly applied on the radiance transfer represented in a pixel basis, on which the BTF is naturally defined. It exploits the coherence in the transfer matrix and a property of matrix element values to reduce both storage and runtime computation cost. Our new algorithm renders at real-time frame rates realistic materials and shadows under all-frequency direct environment lighting. Comparisons show that our algorithm is able to generate images that compare favorably with reference ray tracing results, and has obvious advantages over alternative methods in storage and preprocessing time.
IEEE Transactions on Visualization and Computer Graphics | 2014
Hao Qin; Menglei Chai; Qiming Hou; Zhong Ren; Kun Zhou
We present a cone-based ray tracing algorithm for high-quality rendering of furry objects with reflection, refraction and defocus effects. By aggregating many sampling rays in a pixel as a single cone, we significantly reduce the high supersampling rate required by the thin geometry of fur fibers. To reduce the cost of intersecting fur fibers with cones, we construct a bounding volume hierarchy for the fiber geometry to find the fibers potentially intersecting with cones, and use a set of connected ribbons to approximate the projections of these fibers on the image plane. The computational cost of compositing and filtering transparent samples within each cone is effectively reduced by approximating away in-cone variations of shading, opacity and occlusion. The result is a highly efficient ray tracing algorithm for furry objects which is able to render images of quality comparable to those generated by alternative methods, while significantly reducing the rendering time. We demonstrate the rendering quality and performance of our algorithm using several examples and a user study.