Sehoon Yea
Mitsubishi Electric Research Laboratories
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sehoon Yea.
picture coding symposium | 2009
Kwan-Jung Oh; Sehoon Yea; Yo-Sung Ho
Depth image-based rendering (DIBR) is generally used to synthesize virtual view images in free viewpoint television (FTV) and three-dimensional (3-D) video. One of the main problems in DIBR is how to fill the holes caused by disocclusion regions and inaccurate depth values. In this paper, we propose a new hole filling method using a depth based in-painting technique. Experimental results show that the proposed hole filling method provides improved rendering quality both objectively and subjectively.
Signal Processing-image Communication | 2009
Sehoon Yea; Anthony Vetro
We propose a rate-distortion-optimized framework that incorporates view synthesis for improved prediction in multiview video coding. In the proposed scheme, auxiliary information, including depth data, is encoded and used at the decoder to generate the view synthesis prediction data. The proposed method employs optimal mode decision including view synthesis prediction, and sub-pixel reference matching to improve prediction accuracy of the view synthesis prediction. Novel variants of the skip and direct modes are also presented, which infer the depth and correction vector information from neighboring blocks in a synthesized reference picture to reduce the bits needed for the view synthesis prediction mode. We demonstrate two multiview video coding scenarios in which view synthesis prediction is employed. In the first scenario, the goal is to improve the coding efficiency of multiview video where block-based depths and correction vectors are encoded by CABAC in a lossless manner on a macroblock basis. A variable block-size depth/motion search algorithm is described. Experimental results demonstrate that view synthesis prediction does provide some coding gains when combined with disparity-compensated prediction. In the second scenario, the goal is to use view synthesis prediction for reducing rate overhead incurred by transmitting depth maps for improved support of 3DTV and free-viewpoint video applications. It is assumed that the complete depth map for each view is encoded separately from the multiview video and used at the receiver to generate intermediate views. We utilize this information for view synthesis prediction to improve overall coding efficiency. Experimental results show that the rate overhead incurred by coding depth maps of varying quality could be offset by utilizing the proposed view synthesis prediction techniques to reduce the bitrate required for coding multiview video.
IEEE Signal Processing Letters | 2009
Kwan-Jung Oh; Sehoon Yea; Anthony Vetro; Yo-Sung Ho
A depth image represents three-dimensional (3-D) scene information and is commonly used for depth image-based rendering (DIBR) to support 3-D video and free-viewpoint video applications. The virtual view is generally rendered by the DIBR technique and its quality depends highly on the quality of depth image. Thus, efficient depth coding is crucial to realize the 3-D video system. In this letter, we propose a depth reconstruction filter and depth down/up sampling techniques to improve depth coding performance. Experimental results demonstrate that the proposed methods reduce the bit-rate for depth coding and achieve better rendering quality.
Proceedings of SPIE | 2008
Anthony Vetro; Sehoon Yea; Aljoscha Smolic
There has been increased momentum recently in the production of 3D content for cinema applications; for the most part, this has been limited to stereo content. There are also a variety of display technologies on the market that support 3DTV, each offering a different viewing experience and having different input requirements. More specifically, stereoscopic displays support stereo content and require glasses, while auto-stereoscopic displays avoid the need for glasses by rendering view-dependent stereo pairs for a multitude of viewing angles. To realize high quality auto-stereoscopic displays, multiple views of the video must either be provided as input to the display, or these views must be created locally at the display. The former approach has difficulties in that the production environment is typically limited to stereo, and transmission bandwidth for a large number of views is not likely to be available. This paper discusses an emerging 3D data format that enables the latter approach to be realized. A new framework for efficiently representing a 3D scene and enabling the reconstruction of an arbitrarily large number of views prior to rendering is introduced. Several design challenges are also highlighted through experimental results.
IEEE Transactions on Image Processing | 2009
Dung Trung Vo; Truong Q. Nguyen; Sehoon Yea; Anthony Vetro
A fuzzy filter adaptive to both samples activity and the relative position between samples is proposed to reduce the artifacts in compressed multidimensional signals. For JPEG images, the fuzzy spatial filter is based on the directional characteristics of ringing artifacts along the strong edges. For compressed video sequences, the motion compensated spatiotemporal filter (MCSTF) is applied to intraframe and interframe pixels to deal with both spatial and temporal artifacts. A new metric which considers the tracking characteristic of human eyes is proposed to evaluate the flickering artifacts. Simulations on compressed images and videos show improvement in artifact reduction of the proposed adaptive fuzzy filter over other conventional spatial or temporal filtering approaches.
International Journal of Imaging Systems and Technology | 2010
Kwan-Jung Oh; Sehoon Yea; Anthony Vetro; Yo-Sung Ho
Virtual view synthesis is one of the most important techniques to realize free viewpoint television and three‐dimensional (3D) video. In this article, we propose a view synthesis method to generate high‐quality intermediate views in such applications and new evaluation metrics named as spatial peak signal‐to‐noise ratio and temporal peak signal‐to‐noise ratio to measure spatial and temporal consistency, respectively. The proposed view synthesis method consists of five major steps: depth preprocessing, depth‐based 3D warping, depth‐based histogram matching, base plus assistant view blending, and depth‐based hole‐filling. The efficiency of the proposed view synthesis method has been verified by evaluating the quality of synthesized images with various metrics such as peak signal‐to‐noise ratio, structural similarity, discrete cosine transform (DCT)‐based video quality metric, and the newly proposed metrics. We have also confirmed that the synthesized images are objectively and subjectively natural.
international conference on image processing | 2007
Anthony Vetro; Sehoon Yea; Matthias Zwicker; Wojciech Matusik; Hanspeter Pfister
This paper addresses signal processing issues related to coded representation, reconstruction and rendering of multiview video for 3D displays. We provide an overview of standardization efforts for multiview video that are aimed at reducing data rates required to represent the multiview video in compressed form. We then present an anti-aliasing filtering technique that effectively eliminates ghosting artifacts when rendering multiview video on 3D displays. Since high-frequency components of the signal are removed, substantial reductions in the compressed data rate could also be realized. Finally, we discuss the importance of scalability in the context of multiview video coding and suggest a combined anti-aliasing and scalable decoding scheme to minimize decoding resources for a given 3D display.
digital television conference | 2007
Serdar Ince; Emin Martinian; Sehoon Yea; Anthony Vetro
The compression of multiview video in an end-to-end 3D system is required to reduce the amount of visual information. Since multiple cameras usually have a common field of view, high compression ratios can be achieved if both the temporal and inter-view redundancy are exploited. View synthesis prediction is a new coding tool for multiview video that essentially generates virtual views of a scene using images from neighboring cameras and estimated depth values. In this work, we consider depth estimation for view synthesis in multiview video encoding. We focus on generating smooth and accurate depth maps, which can be efficiently coded. We present several improvements to the reference block-based depth estimation approach and demonstrate that the proposed method of depth estimation is not only efficient for view synthesis prediction, but also produces depth maps that require much fewer bits to code.
IEEE Signal Processing Magazine | 2007
Matthias Zwicker; Anthony Vetro; Sehoon Yea; Wojciech Matusik; Hanspeter Pfister
Multiview three-dimensional (3-D) displays offer viewing of high-resolution stereoscopic images from arbitrary positions without glasses. This article surveyed different approaches to develop signal processing algorithms for these displays. Such displays consist of view-dependent pixels that reveal a different color according to the viewing angle. Therefore, the left and right eye of an observer sees slightly different images on the screen. This leads to the perception of 3-D depth and parallax effects when the observer moves. Although the basic optical principles of multiview auto-stereoscopy have been known for over a century, only recently displays with increased resolution, or systems based on multiple projectors, have made this approach practical.
picture coding symposium | 2009
Sehoon Yea; Anthony Vetro
It is well-known that large depth-coding errors typically occurring around depth edge areas lead to distorted object boundaries in the synthesized texture images. This paper proposes a multi-layered coding approach for depth images as a complement to the popular edge-aware approaches such as those based on platelets. It is shown that guaranteeing a near-lossless bound on the depth values around the edges by adding extra enhancement layers is an effective way to improve the visual quality of the synthesized images.