Is this you? Create Your Porfile

Ismael Daribo

National Institute of Informatics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ismael Daribo is active.

Explore More

Publication

Featured researches published by Ismael Daribo.

international conference on image processing | 2012

Arithmetic edge coding for arbitrarily shaped sub-block motion prediction in depth video compression

Ismael Daribo; Gene Cheung; Dinei A. F. Florêncio

Depth map compression is important for compact representation of 3D visual data in “texture-plus-depth” format, where texture and depth maps of multiple closely spaced viewpoints are encoded and transmitted. A decoder can then freely synthesize any chosen inter-mediate view via depth-image-based rendering (DIBR) using neighboring coded texture and depth maps as anchors. In this work, we leverage on the observation that “pixels of similar depth have similar motion” to efficiently encode depth video. Specifically, we divide a depth block containing two zones of distinct values (e.g., foreground and background) into two sub-blocks along the dividing edge before performing separate motion prediction. While doing such arbitrarily shaped sub-block motion prediction can lead to very small prediction residuals (resulting in few bits required to code them), it incurs an overhead to losslessly encode dividing edges for sub-block identification. To minimize this overhead, we first devise an edge prediction scheme based on linear regression to predict the next edge direction in a contiguous contour. From the predicted edge direction, we assign probabilities to each possible edge direction using the von Mises distribution, which are subsequently inputted to a conditional arithmetic codec for entropy coding. Experimental results show an average overall bitrate reduction of up to 30% over classical H.264 implementation.

IEEE Transactions on Image Processing | 2014

Arbitrarily Shaped Motion Prediction for Depth Video Compression Using Arithmetic Edge Coding

Ismael Daribo; Dinei A. F. Florêncio; Gene Cheung

Depth image compression is important for compact representation of 3D visual data in texture-plus-depth format, where texture and depth maps from one or more viewpoints are encoded and transmitted. A decoder can then synthesize a freely chosen virtual view via depth-image-based rendering using nearby coded texture and depth maps as reference. Further, depth information can be used in other image processing applications beyond view synthesis, such as object identification, segmentation, and so on. In this paper, we leverage on the observation that neighboring pixels of similar depth have similar motion to efficiently encode depth video. Specifically, we divide a depth block containing two zones of distinct values (e.g., foreground and background) into two arbitrarily shaped regions (sub-blocks) along the dividing boundary before performing separate motion prediction (MP). While such arbitrarily shaped sub-block MP can lead to very small prediction residuals (resulting in few bits required for residual coding), it incurs an overhead to transmit the dividing boundaries for sub-block identification at decoder. To minimize this overhead, we first devise a scheme called arithmetic edge coding (AEC) to efficiently code boundaries that divide blocks into sub-blocks. Specifically, we propose to incorporate the boundary geometrical correlation in an adaptive arithmetic coder in the form of a statistical model. Then, we propose two optimization procedures to further improve the edge coding performance of AEC for a given depth image. The first procedure operates within a code block, and allows lossy compression of the detected block boundary to lower the cost of AEC, with an option to augment boundary depth pixel values matching the new boundary, given the augmented pixels do not adversely affect synthesized view distortion. The second procedure operates across code blocks, and systematically identifies blocks along an object contour that should be coded using sub-block MP via a rate-distortion optimized trellis. Experimental results show an average overall bitrate reduction of up to 33% over classical H.264/AVC.

picture coding symposium | 2012

Arbitrarily shaped sub-block motion prediction in texture map compression using depth information

Ismael Daribo; Dinei A. F. Florêncio; Gene Cheung

When transmitting the so-called “texture-plus-depth” video format, texture and depth maps from the same viewpoint exhibit high correlation. Coded bits from one map can then be used as side information to encode the other. In this paper, we propose to use the depth information to divide the corresponding block in texture map into arbitrarily shaped regions (sub-blocks) for separate motion estimation (ME) and motion compensation (MC). We implemented our proposed sub-block motion prediction (MP) method for texture map coding using depth information as a new coding mode (z-mode) in H.264. Nonetheless, in practical experiments one can observe either a misalignment between texture and depth edges, or an aliasing effect at the texture boundaries. To overcome this issue, z-mode offers two MC types: i) non-overlapping MC, and ii) overlapping MC. In the latter case, overlapped sub-blocks after ME are alpha-blended using a properly designed filter. Moreover, the MV of each sub-block in z-mode is predicted using a Laplacian-weighted average of MVs of neighboring blocks of similar depth. Experimental results show that using z-mode, coding performance of the texture map can be improved by up to 0.7dB compared to native H.264 implementation at high bitrate.

3dtv-conference: the true vision - capture, transmission and display of 3d video | 2012

R-D optimized auxiliary information for inpainting-based view synthesis

Ismael Daribo; Gene Cheung; Thomas Maugey; Pascal Frossard

Texture and depth maps of two neighboring camera viewpoints are typically required for synthesis of an intermediate virtual view via depth-image-based rendering (DIBR). However, the bitrate overhead required for reconstruction of multiple texture and depth maps at decoder can be large. The performance of multiview video encoders such as MVC is limited by the simple fact that the chosen representation is inherently redundant: a texture or depth pixel visible from both camera viewpoints is represented twice. In this paper, we propose an alternative 3D scene representation without such redundancy, yet at decoder, one can still reconstruct texture and depth maps of two camera viewpoints for DIBR-based synthesis of intermediate views. In particular, we propose to first encode texture and depth videos of a single viewpoint, which are used to synthesize the uncoded viewpoint via DIBR at decoder. Then, we encode additional rate-distortion (RD) optimal auxiliary information (AI) to guide an inpainting-based hole-filling algorithm at decoder and complete the missing information due to disocclusion. For a missing pixel patch in the synthesized view, the AI can: i) be skipped and then let the decoder by itself retrieve the missing information, ii) identify a suitable spatial region in the reconstructed view for patch-matching, or iii) explicitly encode missing pixel patch if no satisfactory patch can be found in the reconstructed view. Experimental results show that our alternative representation can achieve up to 41% bit-savings compared to H.264/MVC implementation.

IEEE Transactions on Image Processing | 2013

Navigation Domain Representation For Interactive Multiview Imaging

Thomas Maugey; Ismael Daribo; Gene Cheung; Pascal Frossard

Enabling users to interactively navigate through different viewpoints of a static scene is a new interesting functionality in 3D streaming systems. While it opens exciting perspectives toward rich multimedia applications, it requires the design of novel representations and coding techniques to solve the new challenges imposed by the interactive navigation. In particular, the encoder must prepare a priori a compressed media stream that is flexible enough to enable the free selection of multiview navigation paths by different streaming media clients. Interactivity clearly brings new design constraints: the encoder is unaware of the exact decoding process, while the decoder has to reconstruct information from incomplete subsets of data since the server generally cannot transmit images for all possible viewpoints due to resource constrains. In this paper, we propose a novel multiview data representation that permits us to satisfy bandwidth and storage constraints in an interactive multiview streaming system. In particular, we partition the multiview navigation domain into segments, each of which is described by a reference image (color and depth data) and some auxiliary information. The auxiliary information enables the client to recreate any viewpoint in the navigation segment via view synthesis. The decoder is then able to navigate freely in the segment without further data request to the server; it requests additional data only when it moves to a different segment. We discuss the benefits of this novel representation in interactive navigation systems and further propose a method to optimize the partitioning of the navigation domain into independent segments, under bandwidth and storage constraints. Experimental results confirm the potential of the proposed representation; namely, our system leads to similar compression performance as classical inter-view coding, while it provides the high level of flexibility that is required for interactive streaming. Because of these unique properties, our new framework represents a promising solution for 3D data representation in novel interactive multimedia services.

3dtv-conference: the true vision - capture, transmission and display of 3d video | 2012

Adaptive arithmetic coding for point cloud compression

Ismael Daribo; Ryo Furukawa; Ryusuke Sagawa; Hiroshi Kawasaki

Recently, structured-light-based scanning systems have gain in popularity and are capable of modeling entire dense shapes that evolve over time with a single scan (a.k.a. one-shot scan). By projecting a static grid pattern onto the object surface, one-shot shape reconstruction methods can scan moving objects while still maintaining dense reconstruction. However, the amount of 3D data produced by these systems grows rapidly with point cloud of millions of points. As a consequence, effective point cloud compression scheme is required to face the transmission need. In this paper we propose a new approach to compress point cloud by taking advantage of the fact that arithmetic coding can be split into two parts: an encoder that actually produces the compressed bitstream, and a modeler that feeds information into the encoder. In particular, for each position point and normal, we propose to calculate the distribution of probabilities based on their spatial prediction as modeler, while classical point cloud coder mainly focus on the reduction of the prediction residual. Experimental results demonstrate the effectiveness of the proposed method.

visual communications and image processing | 2011

Point cloud compression for grid-pattern-based 3D scanning system

Ismael Daribo; Ryo Furukawa; Ryusuke Sagawa; Hiroshi Kawasaki; Shinsaku Hiura; Naoki Asada

Recently it is relatively easy to produce digital point sampled 3D geometric models. In sight of the increasing capability of 3D scanning systems to produce models with millions of points, compression efficiency is of paramount importance. In this paper, we propose a novel competition-based predictive method for single-rate compression of 3D models represented as point cloud. In particular we aim at 3D scanning methods based on grid pattern. The proposed method takes advantage of the pattern characteristic made of vertical and horizontal lines, by assuming that the object surface is sampled in curve of points. We then designed and implemented a predictive coder driven by this curve-based point representation. Novel prediction techniques are specifically designed for a curve-based cloud of points, and been competing between them to achieve high quality 3D reconstruction. Experimental results demonstrate the effectiveness of the proposed method.

Archive | 2013

Effects of Wavelet-Based Depth Video Compression

Ismael Daribo; Hideo Saito; Ryo Furukawa; Shinsaku Hiura; Naoki Asada

Multi-view video (MVV) representation based on depth data, such as multi-view video plus depth (MVD), is emerging a new type of 3D video communication services. In the meantime, the problem of coding and transmitting the depth video is being raised in addition to classical texture video. Depth video is considered as key side information in novel view synthesis within MVV systems, such as three-dimensional television (3D-TV) or free viewpoint television (FTV). Nonetheless the influence of depth compression on the novel synthesized view is still a contentious issue. In this chapter, we propose to discuss and investigate the impact of the wavelet-based compression of the depth video on the quality of the view synthesis. After the analysis, different frameworks are presented to reduce the disturbing depth compression effects on the novel synthesized view.

digital image computing: techniques and applications | 2012

Hierarchical requantization of depth data for 3D visual communications

Ismael Daribo

Depth data is recognized as an important information in 3D visual communications. Whereas the representation, coding and transmitting of depth data is still an open problem, the depth data quality strongly also depend of its bit precision, i.e. how many bits are used to represent the depth signal. This paper addresses the efficient requantization of a n-bit depth data to a lower m-bit representation, in order to be compliant with classical video encoder input. The proposed depth mapping method to a lower m-bit precision is carried out through a binary space partition wherein the construction is based on histogram analysis. The resulting constrained optimization problem can be solved in O(2m) time. Experimental results show that this new depth requantization strategy leads to a smaller quantization error and hence better synthesis novel views by Depth Image Based Rendering (DIBR).

pacific-rim symposium on image and video technology | 2011

Dynamic compression of curve-based point cloud

Ismael Daribo; Ryo Furukawa; Ryusuke Sagawa; Hiroshi Kawasaki; Shinsaku Hiura; Naoki Asada

With the increasing demands for highly detailed 3D data, dynamic scanning systems are capable of producing 3D+t (a.k.a. 4D) spatio-temporal models with millions of points recently. As a consequence, effective 4D geometry compression schemes are required to face the need to store/transmit the huge amount of data, in addition to classical static 3D data. In this paper, we propose a 4D spatio-temporal point cloud encoder via a curve-based representation of the point cloud, particularly well-suited for dynamic structured-light-based scanning systems, wherein a grid pattern is projected onto the surface object. The object surface is then naturally sampled in a series of curves, due to the grid pattern. This motivates our choice to leverage a curve-based representation to remove the spatial and temporal correlation of the sampled point along the scanning directions through a competitive-based predictive encoder that includes different spatio-temporal prediction modes. Experimental results show the significant gain obtained with the proposed method.

Explore More