Thanh-Tung Cao
National University of Singapore
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thanh-Tung Cao.
interactive 3d graphics and games | 2010
Thanh-Tung Cao; Ke Tang; Anis Mohamed; Tiow Seng Tan
We propose a Parallel Banding Algorithm (PBA) on the GPU to compute the exact Euclidean Distance Transform (EDT) for a binary image in 2D and higher dimensions. Partitioning the image into small bands to process and then merging them concurrently, PBA computes the exact EDT with optimal linear total work, high level of parallelism and a good memory access pattern. This work is the first attempt to exploit the enormous power of the GPU in computing the exact EDT, while prior works are only on approximation. Compared to these other algorithms in our experiments, our exact algorithm is still a few times faster in 2D and 3D for most input sizes. We illustrate the use of our algorithm in applications such as computing the Euclidean skeleton using the integer medial axis transform, performing morphological operations of 3D volumetric data, and constructing 2D weighted centroidal Voronoi diagrams.
acm sigplan symposium on principles and practice of parallel programming | 2012
Sadegh Nobari; Thanh-Tung Cao; Panagiotis Karras; Stéphane Bressan
The proliferation of data in graph form calls for the development of scalable graph algorithms that exploit parallel processing environments. One such problem is the computation of a graphs minimum spanning forest (MSF). Past research has proposed several parallel algorithms for this problem, yet none of them scales to large, high-density graphs. In this paper we propose a novel, scalable, parallel MSF algorithm for undirected weighted graphs. Our algorithm leverages Prims algorithm in a parallel fashion, concurrently expanding several subsets of the computed MSF. Our effort focuses on minimizing the communication among different processors without constraining the local growth of a processors computed subtree. In effect, we achieve a scalability that previous approaches lacked. We implement our algorithm in CUDA, running on a GPU and study its performance using real and synthetic, sparse as well as dense, structured and unstructured graph data. Our experimental study demonstrates that our algorithm outperforms the previous state-of-the-art GPU-based MSF algorithm, while being several orders of magnitude faster than sequential CPU-based algorithms.
IEEE Transactions on Visualization and Computer Graphics | 2013
Meng Qi; Thanh-Tung Cao; Tiow Seng Tan
We propose the first graphics processing unit (GPU) solution to compute the 2D constrained Delaunay triangulation (CDT) of a planar straight line graph (PSLG) consisting of points and edges. There are many existing CPU algorithms to solve the CDT problem in computational geometry, yet there has been no prior approach to solve this problem efficiently using the parallel computing power of the GPU. For the special case of the CDT problem where the PSLG consists of just points, which is simply the normal Delaunay triangulation (DT) problem, a hybrid approach using the GPU together with the CPU to partially speed up the computation has already been presented in the literature. Our work, on the other hand, accelerates the entire computation on the GPU. Our implementation using the CUDA programming model on NVIDIA GPUs is numerically robust, and runs up to an order of magnitude faster than the best sequential implementations on the CPU. This result is reflected in our experiment with both randomly generated PSLGs and real-world GIS data having millions of points and edges.
interactive 3d graphics and games | 2012
Meng Qi; Thanh-Tung Cao; Tiow Seng Tan
We propose the first graphics processing unit (GPU) solution to compute the 2D constrained Delaunay triangulation (CDT) of a planar straight line graph (PSLG) consisting of points and edges. There are many existing CPU algorithms to solve the CDT problem in computational geometry, yet there has been no prior approach to solve this problem efficiently using the parallel computing power of the GPU. For the special case of the CDT problem where the PSLG consists of just points, which is simply the normal Delaunay triangulation (DT) problem, a hybrid approach using the GPU together with the CPU to partially speed up the computation has already been presented in the literature. Our work, on the other hand, accelerates the entire computation on the GPU. Our implementation using the CUDA programming model on NVIDIA GPUs is numerically robust, and runs up to an order of magnitude faster than the best sequential implementations on the CPU. This result is reflected in our experiment with both randomly generated PSLGs and real-world GIS data having millions of points and edges.
interactive 3d graphics and games | 2013
Mingcen Gao; Thanh-Tung Cao; Tiow Seng Tan; Zhiyong Huang
Flipping is a local and efficient operation to construct the convex hull in an incremental fashion. However, it is known that the traditional flip algorithm is not able to compute the convex hull when applied to a polyhedron in R3. Our novel Flip-Flop algorithm is a variant of the flip algorithm. It overcomes the deficiency of the traditional one to always compute the convex hull of a given star-shaped polyhedron with provable correctness. Applying this to construct convex hull of a point set in R3, we develop ffHull, a flip algorithm that allows nonrestrictive insertion of many vertices before any flipping of edges. This is unlike the well-known incremental fashion of strictly alternating between inserting a single vertex and flipping. The new approach is not only simpler and more efficient for CPU implementation but also maps well to the massively parallel nature of the modern GPU. As shown in our experiments, ffHull running on the CPU is as fast as the best-known convex hull implementation, qHull. As for the GPU, ffHull also outperforms all known prior work. From this, we further obtain the first known solution to computing the 2D regular triangulation on the GPU.
Computational Geometry: Theory and Applications | 2015
Thanh-Tung Cao; Herbert Edelsbrunner; Tiow Seng Tan
Abstract We prove that the dual of the digital Voronoi diagram constructed by flooding the plane from the data points gives a geometrically and topologically correct dual triangulation. This provides the proof of correctness for recently developed GPU algorithms that outperform traditional CPU algorithms for constructing two-dimensional Delaunay triangulations.
interactive 3d graphics and games | 2014
Thanh-Tung Cao; Ashwin Nanjappa; Mingcen Gao; Tiow Seng Tan
We propose the first algorithm to compute the 3D Delaunay triangulation (DT) on the GPU. Our algorithm uses massively parallel point insertion followed by bilateral flipping, a powerful local operation in computational geometry. Although a flipping algorithm is very amenable to parallel processing and has been employed to construct the 2D DT and the 3D convex hull on the GPU, to our knowledge there is no such successful attempt for constructing the 3D DT. This is because in 3D when many points are inserted in parallel, flipping gets stuck long before reaching the DT, and thus any further correction to obtain the DT is costly. In contrast, we show that by alternating between parallel point insertion and flipping, together with picking an appropriate point insertion order, one can still obtain a triangulation very close to Delaunay. We further propose an adaptive star splaying approach to subsequently transform this result into the 3D DT efficiently. In addition, we introduce several GPU speedup techniques for our implementation, which are also useful for general computational geometry algorithms. On the whole, our hybrid approach, with the GPU accelerating the main work of constructing a near-Delaunay structure and the CPU transforming that into the 3D DT, outperforms all existing sequential CPU algorithms by up to an order of magnitude, in both synthetic and real-world inputs. We also adapt our approach to the 2D DT problem and obtain similar speedup over the best sequential CPU algorithms, and up to 2 times over previous GPU algorithms.
ACM Transactions on Mathematical Software | 2013
Mingcen Gao; Thanh-Tung Cao; Ashwin Nanjappa; Tiow Seng Tan; Zhiyong Huang
A novel algorithm is presented to compute the convex hull of a point set in ℝ3 using the graphics processing unit (GPU). By exploiting the relationship between the Voronoi diagram and the convex hull, the algorithm derives the approximation of the convex hull from the former. The other extreme vertices of the convex hull are then found by using a two-round checking in the digital and the continuous space successively. The algorithm does not need explicit locking or any other concurrency control mechanism, thus it can maximize the parallelism available on the modern GPU. The implementation using the CUDA programming model on NVIDIA GPUs is exact and efficient. The experiments show that it is up to an order of magnitude faster than other sequential convex hull implementations running on the CPU for inputs of millions of points. The works demonstrate that the GPU can be used to solve nontrivial computational geometry problems with significant performance benefit.
interactive 3d graphics and games | 2011
Mingcen Gao; Thanh-Tung Cao; Tiow Seng Tan; Zhiyong Huang
We present a novel approach, termed gHull, to compute the convex hull for a 3D point set using the GPU. We exploit the fast computation of the digital Voronoi diagram and its relationship with the convex hull to compute the answer from the former rather than directly. Our algorithm is robust, while able to maximizes the parallelism available in the GPU to achieve significant speed up.
IEEE Transactions on Visualization and Computer Graphics | 2017
Mingcen Gao; Thanh-Tung Cao; Tiow Seng Tan
Flip is a simple and local operation to transform one triangulation to another. It makes changes only to some neighboring simplices, without considering any attribute or configuration global in nature to the triangulation. Thanks to this characteristic, several flips can be independently applied to different small, non-overlapping regions of one triangulation. Such operation is favored when designing algorithms for data-parallel, massively multithreaded hardware, such as the GPU. However, most existing flip algorithms are designed to be executed sequentially, and usually need some restrictions on the execution order of flips, making them hard to be adapted to parallel computation. In this paper, we present an in depth study of flip algorithms in low dimensions, with the emphasis on the flexibility of their execution order. In particular, we propose a series of provably correct flip algorithms for regular triangulation and convex hull in 2D and 3D, with implementations for both CPUs and GPUs. Our experiment shows that our GPU implementation for constructing these structures from a given point set achieves up to two orders of magnitude of speedup over other popular single-threaded CPU implementation of existing algorithms.