Aljoscha Smolic
Disney Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aljoscha Smolic.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Philipp Merkle; Aljoscha Smolic; Karsten Müller; Thomas Wiegand
An experimental analysis of multiview video coding (MVC) for various temporal and inter-view prediction structures is presented. The compression method is based on the multiple reference picture technique in the H.264/AVC video coding standard. The idea is to exploit the statistical dependencies from both temporal and inter-view reference pictures for motion-compensated prediction. The effectiveness of this approach is demonstrated by an experimental analysis of temporal versus inter-view prediction in terms of the Lagrange cost function. The results show that prediction with temporal reference pictures is highly efficient, but for 20% of a pictures blocks on average prediction with reference pictures from adjacent views is more efficient. Hierarchical B pictures are used as basic structure for temporal prediction. Their advantages are combined with inter-view prediction for different temporal hierarchy levels, starting from simulcast coding with no inter-view prediction up to full level inter-view prediction. When using inter-view prediction at key picture temporal levels, average gains of 1.4-dB peak signal-to-noise ratio (PSNR) are reported, while additionally using inter-view prediction at nonkey picture temporal levels, average gains of 1.6-dB PSNR are reported. For some cases, gains of more than 3 dB, corresponding to bit-rate savings of up to 50%, are obtained.
international conference on image processing | 2007
Philipp Merkle; Aljoscha Smolic; Karsten Müller; Thomas Wiegand
A study on the video plus depth representation for multi-view video sequences is presented. Such a 3D representation enables functionalities like 3D television and free viewpoint video. Compression is based on algorithms for multi-view video coding, which exploit statistical dependencies from both temporal and inter-view reference pictures for prediction of both color and depth data. Coding efficiency of prediction structures with and without inter-view reference pictures is analyzed for multi-view video plus depth data, reporting gains in luma PSNR of up to 0.5 dB for depth and 0.3 dB for color. The main benefit from using a multi-view video plus depth representation is that intermediate views can be easily rendered. Therefore the impact on image quality of rendered arbitrary intermediate views is investigated and analyzed in a second part, comparing compressed multi-view video plus depth data at different bit rates with the uncompressed original.
Signal Processing-image Communication | 2007
Peter Kauff; Nicole Atzpadin; Christoph Fehn; Marcus Müller; Oliver Schreer; Aljoscha Smolic; Ralf Tanger
Due to enormous progress in the areas of auto-stereoscopic 3D displays, digital video broadcast and computer vision algorithms, 3D television (3DTV) has reached a high technical maturity and many people now believe in its readiness for marketing. Experimental prototypes of entire 3DTV processing chains have been demonstrated successfully during the last few years, and the motion picture experts group (MPEG) of ISO/IEC has launched related ad hoc groups and standardization efforts envisaging the emerging market segment of 3DTV. In this context the paper discusses an advanced approach for a 3DTV service, which is based on the concept of video-plus-depth data representations. It particularly considers aspects of interoperability and multi-view adaptation for the case that different multi-baseline geometries are used for multi-view capturing and 3D display. Furthermore it presents algorithmic solutions for the creation of depth maps and depth image-based rendering related to this framework of multi-view adaptation. In contrast to other proposals, which are more focused on specialized configurations, the underlying approach provides a modular and flexible system architecture supporting a wide range of multi-view structures.
international conference on computer graphics and interactive techniques | 2010
Manuel Lang; Alexander Hornung; Oliver Wang; Steven Poulakos; Aljoscha Smolic; Markus H. Gross
This paper addresses the problem of remapping the disparity range of stereoscopic images and video. Such operations are highly important for a variety of issues arising from the production, live broadcast, and consumption of 3D content. Our work is motivated by the observation that the displayed depth and the resulting 3D viewing experience are dictated by a complex combination of perceptual, technological, and artistic constraints. We first discuss the most important perceptual aspects of stereo vision and their implications for stereoscopic content creation. We then formalize these insights into a set of basic disparity mapping operators. These operators enable us to control and retarget the depth of a stereoscopic scene in a nonlinear and locally adaptive fashion. To implement our operators, we propose a new strategy based on stereoscopic warping of the input video streams. From a sparse set of stereo correspondences, our algorithm computes disparity and image-based saliency estimates, and uses them to compute a deformation of the input views so as to meet the target disparities. Our approach represents a practical solution for actual stereo production and display that does not require camera calibration, accurate dense depth maps, occlusion handling, or inpainting. We demonstrate the performance and versatility of our method using examples from live action post-production, 3D display size adaptation, and live broadcast. An additional user study and ground truth comparison further provide evidence for the quality and practical relevance of the presented work.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Aljoscha Smolic; Karsten Mueller; Nikolce Stefanoski; Joern Ostermann; Atanas Gotchev; Gozde Bozdagi Akar; Georgios Triantafyllidis; Alper Koz
Research efforts on 3DTV technology have been strengthened worldwide recently, covering the whole media processing chain from capture to display. Different 3DTV systems rely on different 3D scene representations that integrate various types of data. Efficient coding of these data is crucial for the success of 3DTV. Compression of pixel-type data including stereo video, multiview video, and associated depth or disparity maps extends available principles of classical video coding. Powerful algorithms and open international standards for multiview video coding and coding of video plus depth data are available and under development, which will provide the basis for introduction of various 3DTV systems and services in the near future. Compression of 3D mesh models has also reached a high level of maturity. For static geometry, a variety of powerful algorithms are available to efficiently compress vertices and connectivity. Compression of dynamic 3D geometry is currently a more active field of research. Temporal prediction is an important mechanism to remove redundancy from animated 3D mesh sequences. Error resilience is important for transmission of data over error prone channels, and multiple description coding (MDC) is a suitable way to protect data. MDC of still images and 2D video has already been widely studied, whereas multiview video and 3D meshes have been addressed only recently. Intellectual property protection of 3D data by watermarking is a pioneering research area as well. The 3D watermarking methods in the literature are classified into three groups, considering the dimensions of the main components of scene representations and the resulting components after applying the algorithm. In general, 3DTV coding technology is maturating. Systems and services may enter the market in the near future. However, the research area is relatively young compared to coding of other types of media. Therefore, there is still a lot of room for improvement and new development of algorithms.
IEEE Signal Processing Magazine | 2007
Akira Kubota; Aljoscha Smolic; Marcus A. Magnor; Masayuki Tanimoto; Tsuhan Chen; Cha Zhang
The eight articles in this special section are devoted to multi-view imaging and three dimensional television displays.
international conference on multimedia and expo | 2006
Aljoscha Smolic; Karsten Mueller; Philipp Merkle; Christoph Fehn; Peter Kauff; Peter Eisert; Thomas Wiegand
An overview of 3D and free viewpoint video is given in this paper with special focus on related standardization activities in MPEG. Free viewpoint video allows the user to freely navigate within real world visual scenes, as known from virtual worlds in computer graphics. Examples are shown, highlighting standards conform realization using MPEG-4. Then the principles of 3D video are introduced providing the user with a 3D depth impression of the observed scene. Example systems are described again focusing on their realization based on MPEG-4. Finally multi-view video coding is described as a key component for 3D and free viewpoint video systems. The conclusion is that the necessary technology including standard media formats for 3D and free viewpoint is available or will be available in the near future, and that there is a clear demand from industry and user side for such applications. 3D TV at home and free viewpoint video on DVD will be available soon, and will create huge new markets
Signal Processing-image Communication | 2009
Philipp Merkle; Yannick Morvan; Aljoscha Smolic; Dirk Farin; Karsten Müller; Thomas Wiegand
This article investigates the interaction between different techniques for depth compression and view synthesis rendering with multiview video plus scene depth data. Two different approaches for depth coding are compared, namely H.264/MVC, using temporal and inter-view reference images for efficient prediction, and the novel platelet-based coding algorithm, characterized by being adapted to the special characteristics of depth-images. Since depth-images are a 2D representation of the 3D scene geometry, depth-image errors lead to geometry distortions. Therefore, the influence of geometry distortions resulting from coding artifacts is evaluated for both coding approaches in two different ways. First, the variation of 3D surface meshes is analyzed using the Hausdorff distance and second, the distortion is evaluated for 2D view synthesis rendering, where color and depth information are used together to render virtual intermediate camera views of the scene. The results show that-although its rate-distortion (R-D) performance is worse-platelet-based depth coding outperforms H.264, due to improved sharp edge preservation. Therefore, depth coding needs to be evaluated with respect to geometry distortions.
Proceedings of the IEEE | 2005
Aljoscha Smolic; Peter Kauff
Interactivity in the sense of being able to explore and navigate audio-visual scenes by freely choosing viewpoint and viewing direction, is an important key feature of new and emerging audio-visual media. This paper gives an overview of suitable technology for such applications, with a focus on international standards, which are beneficial for consumers, service providers, and manufacturers. We first give a general classification and overview of interactive scene representation formats as commonly used in computer graphics literature. Then, we describe popular standard formats for interactive three-dimensional (3-D) scene representation and creation of virtual environments, the virtual reality modeling language (VRML), and the MPEG-4 BInary Format for Scenes (BIFS) with some examples. Recent extensions to MPEG-4 BIFS, the Animation Framework eXtension (AFX), providing advanced computer graphics tools, are explained and illustrated. New technologies mainly targeted at reconstruction, modeling, and representation of dynamic real world scenes are further studied. The user shall be able to navigate photorealistic scenes within certain restrictions, which can be roughly defined as 3-D video. Omnidirectional video is an extension of the planar two-dimensional (2-D) image plane to a spherical or cylindrical image plane. Any 2-D view in any direction can be rendered from this overall recording to give the user the impression of looking around. In interactive stereo two views, one for each eye, are synthesized to provide the user with an adequate depth cue of the observed scene. Head motion parallax viewing can be supported in a certain operating range if sufficient depth or disparity data are delivered with the video data. In free viewpoint video, a dynamic scene is captured by a number of cameras. The input data are transformed into a special data representation that enables interactive navigation through the dynamic scene environment.
Eurasip Journal on Image and Video Processing | 2008
Karsten Müller; Aljoscha Smolic; Kristina Dix; Philipp Merkle; Peter Kauff; Thomas Wiegand
Interest in 3D video applications and systems is growing rapidly and technology is maturating. It is expected that multiview autostereoscopic displays will play an important role in home user environments, since they support multiuser 3D sensation and motion parallax impression. The tremendous data rate cannot be handled efficiently by representation and coding formats such as MVC or MPEG-C Part 3. Multiview video plus depth (MVD) is a new format that efficiently supports such advanced 3DV systems, but this requires high-quality intermediate view synthesis. For this, a new approach is presented that separates unreliable image regions along depth discontinuities from reliable image regions, which are treated separately and fused to the final interpolated view. In contrast to previous layered approaches, our algorithm uses two boundary layers and one reliable layer, performs image-based 3D warping only, and was generically implemented, that is, does not necessarily rely on 3D graphics support. Furthermore, different hole-filling and filtering methods are added to provide high-quality intermediate views. As a result, high-quality intermediate views for an existing 9-view auto-stereoscopic display as well as other stereo- and multiscopic displays are presented, which prove the suitability of our approach for advanced 3DV systems.