Tony Tung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tony Tung is active.

Explore More

Publication

Featured researches published by Tony Tung.

International Journal of Shape Modeling | 2005

The augmented multiresolution reeb graph approach for content-based retrieval of 3d shapes

Tony Tung; Francis J. M. Schmitt

This article presents a 3D shape matching method for 3D mesh models applied to content-based search in database of 3D objects. The approach is based on the multiresolution Reeb graph (MRG) proposed by Hilaga et al.1 MRG provides a rich representation of shapes able in particular to embed the object topology. In our framework, we consider 3D mesh models of various geometrical complexity, of different resolution, and when available with color texture map. The original approach, mainly based on the 3D object topology, is not accurate enough to obtain satisfying matching. Therefore we propose to reinforce the topological consistency conditions of the matching and to merge within the graph geometrical and visual information to improve matching and calculation of shape similarity between models. Besides, all these new attributes can be freely weighted to fit the user requirements for object retrieval. We obtain a flexible multiresolutional and multicriteria representation that we called augmented multiresolution Reeb graph (aMRG). The approach has been tested and compared with other methods. It reveals very performant for the retrieval and the classification of similar 3D shapes.

computer vision and pattern recognition | 2008

Simultaneous super-resolution and 3D video using graph-cuts

Tony Tung; Shohei Nobuhara; Takashi Matsuyama

This paper presents a new method to increase the quality of 3D video, a new media developed to represent 3D objects in motion. This representation is obtained from multi-view reconstruction techniques that require images recorded simultaneously by several video cameras. All cameras are calibrated and placed around a dedicated studio to fully surround the models. The limited quality and quantity of cameras may produce inaccurate 3D model reconstruction with low quality texture. To overcome this issue, first we propose super-resolution (SR) techniques for 3D video: SR on multi-view images and SR on single-view video frames. Second, we propose to combine both super-resolution and dynamic 3D shape reconstruction problems into a unique Markov random field (MRF) energy formulation. The MRF minimization is performed using graph-cuts. Thus, we jointly compute the optimal solution for super-resolved texture and 3D shape model reconstruction. Moreover, we propose a coarse-to-fine strategy to iteratively produce 3D video with increasing quality. Our experiments show the accuracy and robustness of the proposed technique on challenging 3D video sequences.

computer vision and pattern recognition | 2010

Dynamic surface matching by geodesic mapping for 3D animation transfer

Tony Tung; Takashi Matsuyama

This paper presents a novel approach that achieves complete matching of 3D dynamic surfaces. Surfaces are captured from multi-view video data and represented by sequences of 3D manifold meshes in motion (3D videos). We propose to perform dense surface matching between 3D video frames using geodesic diffeomorphisms. Our algorithm uses a coarse-to-fine strategy to derive a robust correspondence map, then a probabilistic formulation is coupled with a voting scheme in order to obtain local unicity of matching candidates and a smooth mapping. The significant advantage of the proposed technique compared to existing approaches is that it does not rely on a color-based feature extraction process. Hence, our method does not lose accuracy in poorly textured regions and is not bounded to be used on video sequences of a unique subject. Therefore our complete surface mapping can be applied to: (1) texture transfer between surface models extracted from different sequences, (2) dense motion flow estimation in 3D video, and (3) motion transfer from a 3D video to an unanimated 3D model. Experiments are performed on challenging publicly available real-world datasets and show compelling results.

Archive | 2012

3D Video and Its Applications

Takashi Matsuyama; Shohei Nobuhara; Takeshi Takai; Tony Tung

This book presents a broad review of state-of-the-art 3D video production technologies and applications. The text opens with a concise introduction to the field, before examining the design and calibration methods for multi-view camera systems, including practical implementation technologies. A range of algorithms are then described for producing 3D video from video data. A selection of 3D video applications are also demonstrated. Features: describes real-time synchronized multi-view video capture, and object tracking with a group of active cameras; discusses geometric and photometric camera calibration, and 3D video studio design with active cameras; examines 3D shape and motion reconstruction, texture mapping and image rendering, and lighting environment estimation; demonstrates attractive 3D visualization, visual contents analysis and editing, 3D body action analysis, and data compression; highlights the remaining challenges and the exciting avenues for future research in 3D video technology.

international conference on computer vision | 2009

Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo

Tony Tung; Shohei Nobuhara; Takashi Matsuyama

This paper presents a novel approach to achieve accurate and complete multi-view reconstruction of dynamic scenes (or 3D videos). 3D videos consist in sequences of 3D models in motion captured by a surrounding set of video cameras. To date 3D videos are reconstructed using multiview wide baseline stereo (MVS) reconstruction techniques. However it is still tedious to solve stereo correspondence problems: reconstruction accuracy falls when stereo photo-consistency is weak, and completeness is limited by self-occlusions. Most MVS techniques were indeed designed to deal with static objects in a controlled environment and therefore cannot solve these issues. Hence we propose to take advantage of the image content stability provided by each single-view video to recover any surface regions visible by at least one camera. In particular we present an original probabilistic framework to derive and predict the true surface of models. We propose to fuse multi-view structure-from-motion with robust 3D features obtained by MVS in order to significantly improve reconstruction completeness and accuracy. A min-cut problem where all exact features serve as priors is solved in a final step to reconstruct the 3D models. In addition, experimental results were conducted on synthetic and challenging real world datasets to illustrate the robustness and accuracy of our method.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

Topology Dictionary for 3D Video Understanding

Tony Tung; Takashi Matsuyama

This paper presents a novel approach that achieves 3D video understanding. 3D video consists of a stream of 3D models of subjects in motion. The acquisition of long sequences requires large storage space (2 GB for 1 min). Moreover, it is tedious to browse data sets and extract meaningful information. We propose the topology dictionary to encode and describe 3D video content. The model consists of a topology-based shape descriptor dictionary which can be generated from either extracted patterns or training sequences. The model relies on 1) topology description and classification using Reeb graphs, and 2) a Markov motion graph to represent topology change states. We show that the use of Reeb graphs as the high-level topology descriptor is relevant. It allows the dictionary to automatically model complex sequences, whereas other strategies would require prior knowledge on the shape and topology of the captured subjects. Our approach serves to encode 3D video sequences, and can be applied for content-based description and summarization of 3D video sequences. Furthermore, topology class labeling during a learning process enables the system to perform content-based event recognition. Experiments were carried out on various 3D videos. We showcase an application for 3D video progressive summarization using the topology dictionary.

computer vision and pattern recognition | 2007

Topology matching for 3D video compression

Tony Tung; Francis J. M. Schmitt; Takashi Matsuyama

This paper presents a new technique to reduce the storage cost of high quality 3D video. In 3D video, a sequence of 3D objects represents scenes in motion. Every frame is composed by one or several accurate 3D meshes with attached high fidelity properties such as color and texture. Each frame is acquired at video rate. The entire video sequence requires a huge amount of free disk space. To overcome this issue, we propose an original approach using Reeb graphs, which are well-known topology based shape descriptors. In particular, we take advantage of the augmented multiresolution Reeb graph properties to store the relevant information of the 3D model of each frame. This graph structure has shown its efficiency as a motion descriptor, being able to track similar nodes all along the 3D video sequence. Therefore we can describe and reconstruct the 3D models of all frames with a very low-cost data size. The algorithm has been implemented as a fully automatic 3D video compression system. Our experiments show the robustness and accuracy of the proposed technique by comparing reconstructed sequences against challenging real ones.

IEEE Transactions on Human-Machine Systems | 2014

Multiparty Interaction Understanding Using Smart Multimodal Digital Signage

Tony Tung; Randy Gomez; Tatsuya Kawahara; Takashi Matsuyama

This paper presents a novel multimodal system designed for multi-party human-human interaction analysis. The design of human-machine interfaces for multiple users is challenging because simultaneous processing of actions and reactions have to be consistent. The proposed system consists of a large display equipped with multiple sensing devices: microphone array, HD video cameras, and depth sensors. Multiple users positioned in front of the panel freely interact using voice or gesture while looking at the displayed content, without wearing any particular devices (such as motion capture sensors or head mounted devices). Acoustic and visual information is captured and processed jointly using established and state-of-the-art techniques to obtain individual speech and gaze direction. Furthermore, a new framework is proposed to model A/V multimodal interaction between verbal and nonverbal communication events. Dynamics of audio signals obtained from speaker diarization and head poses extracted from video images are modeled using hybrid dynamical systems (HDS). We show that HDS temporal structure characteristics can be used for multimodal interaction level estimation, which is useful feedback that can help to improve multi-party communication experience. Experimental results using synthetic and real-world datasets of group communication such as poster presentations show the feasibility of the proposed multimodal system.

european conference on computer vision | 2014

On Mean Pose and Variability of 3D Deformable Models

Benjamin Allain; Jean-Sébastien Franco; Edmond Boyer; Tony Tung

We present a novel methodology for the analysis of complex object shapes in motion observed by multiple video cameras. In particular, we propose to learn local surface rigidity probabilities (i.e., deformations), and to estimate a mean pose over a temporal sequence. Local deformations can be used for rigidity-based dynamic surface segmentation, while a mean pose can be used as a sequence keyframe or a cluster prototype and has therefore numerous applications, such as motion synthesis or sequential alignment for compression or morphing. We take advantage of recent advances in surface tracking techniques to formulate a generative model of 3D temporal sequences using a probabilistic framework, which conditions shape fitting over all frames to a simple set of intrinsic surface rigidity properties. Surface tracking and rigidity variable estimation can then be formulated as an Expectation-Maximization inference problem and solved by alternatively minimizing two nested fixed point iterations. We show that this framework provides a new fundamental building block for various applications of shape analysis, and achieves comparable tracking performance to state of the art surface tracking techniques on real datasets, even compared to approaches using strong kinematic priors such as rigid skeletons.

computer vision and pattern recognition | 2009

Topology dictionary with Markov model for 3D video content-based skimming and description

Tony Tung; Takashi Matsuyama

This paper presents a novel approach to skim and describe 3D videos. 3D video is an imaging technology which consists in a stream of 3D models in motion captured by a synchronized set of video cameras. Each frame is composed of one or several 3D models, and therefore the acquisition of long sequences at video rate requires massive storage devices. In order to reduce the storage cost while keeping relevant information, we propose to encode 3D video sequences using a topology-based shape descriptor dictionary. This dictionary is either generated from a set of extracted patterns or learned from training input sequences with semantic annotations. It relies on an unsupervised 3D shape-based clustering of the dataset by Reeb graphs, and features a Markov network to characterize topological changes. The approach allows content-based compression and skimming with accurate recovery of sequences and can handle complex topological changes. Redundancies are detected and skipped based on a probabilistic discrimination process. Semantic description of video sequences is then automatically performed. In addition, forthcoming frame encoding is achieved using a multiresolution matching scheme and allows action recognition in 3D. Our experiments were performed on complex 3D video sequences. We demonstrate the robustness and accuracy of the 3D video skimming with dramatic low bitrate coding and high compression ratio.

Explore More