Pablo Carballeira
Technical University of Madrid
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pablo Carballeira.
IEEE Journal of Selected Topics in Signal Processing | 2012
Pablo Carballeira; Julián Cabrera; Antonio Ortega; Fernando Jaureguizar; Narciso N. García
We present a novel framework for the analysis and optimization of encoding latency for multiview video. First, we characterize the elements that have an influence in the encoding latency performance: 1) the multiview prediction structure and 2) the hardware encoder model. Then, we provide algorithms to find the encoding latency of any arbitrary multiview prediction structure. The proposed framework relies on the directed acyclic graph encoder latency (DAGEL) model, which provides an abstraction of the processing capacity of the encoder by considering an unbounded number of processors. Using graph theoretic algorithms, the DAGEL model allows us to compute the encoding latency of a given prediction structure, and determine the contribution of the prediction dependencies to it. As an example of DAGEL application, we propose an algorithm to reduce the encoding latency of a given multiview prediction structure up to a target value. In our approach, a minimum number of frame dependencies are pruned, until the latency target value is achieved, thus minimizing the degradation of the rate-distortion performance due to the removal of the prediction dependencies. Finally, we analyze the latency performance of the DAGEL derived prediction structures in multiview encoders with limited processing capacity.
3dtv-conference: the true vision - capture, transmission and display of 3d video | 2010
Pablo Carballeira; Gerhard Tech; Julián Cabrera; Karsten Müller; Fernando Jaureguizar; Thomas Wiegand; N. García
We present a preliminary study on the Rate-Distortion (RD) gain that can be achieved applying RD optimization techniques in a multiview plus depth encoder. We consider the use of Multiview Video Coding (MVC) for both, color and depth sequences, and evaluate the improvement that can be obtained allowing a quantization parameter (QP) assignment on a macroblock basis compared to the use of a fixed QP for the whole sequence. The optimization criterion is the minimization of the distortion of the synthesized views generated at the receiver. Our motivation for this criterion is to capture the impact of depth coding according to its final purpose: the generation of virtual views. Since a unique objective quality metric for view synthesis artifacts evaluation has not been set yet, the performance of several algorithms for quality evaluation of the target synthesized view have been compared. Beyond obtaining a better RD performance, as could be expected, results also show that optimized synthesized views achieve absolute lower distortion values than the best result of the approach that uses a fixed QP for the whole sequence.
visual communications and image processing | 2009
Pablo Carballeira; Julián Cabrera; Antonio Ortega; Fernando Jaureguizar; Narciso N. García
We present a novel analysis of encoding latency for arbitrary multiview video coding prediction structures. This analysis allows us to compare the efficiency of multiview prediction structures beyond rate-distortion performance, i.e., it makes it possible to select those structures that are more amenable to low latency encoding. As a result, we have developed a general framework (models and tools) for the characterization of latency that can be used for arbitrary dependency structures. Finally, we have focused on the JMVM prediction structure with the aim of deriving new structures with similar rate distortion performance and lower latency. Experimental results help us illustrate the importance of taking into consideration encoding latency when selecting a prediction structure.
IEEE Journal of Selected Topics in Signal Processing | 2017
Pablo Carballeira; Jesús Gutiérrez; Francisco Morán; Julián Cabrera; Fernando Jaureguizar; Narciso N. García
Super MultiView (SMV) video display is the most promising technology for 3-D glasses-free visualization. Although only a few prototypes are currently available, the research on technical and perceptual factors related to this approach is crucial. This paper presents a novel model to capture the subjective perception of SMV, called the MultiView Perceptual Disparity Model (MVPDM), by means of a parametrization of the relation between: 1) capture and scene settings, and 2) perception of speed comfort and smoothness in the viewpoint transition. The MVPDM is based on a novel parameter: the perceptual disparity, that captures appropriately the perceptual cues specific to SMV visualization. The model has been validated using the results of subjective tests on realistic SMV content as benchmark. On the one hand, the subjective results show a high correlation with the MVPDM parametrization, outperforming previous approaches. On the other hand, this test provides useful information about the parameters of the SMV sequences that should be used to guarantee satisfactory visual experience. Thus, the MVPDM constitutes a valuable tool for the design of subjective evaluation and content creation of SMV.
quality of multimedia experience | 2015
Pablo Carballeira; Jesús Gutiérrez; Francisco Morán; Julián Cabrera; Narciso N. García
We present preliminary experiments on subjective evaluation of Super Multiview Video (SMV) in stereoscopic and auto-stereoscopic displays. SMV displays require a large number of views (typically 80 or more), but are not yet widely available. Subjective evaluation in legacy displays, though not optimal, will therefore be necessary for the development SMV video technologies. This has lead us to perform standardized subjective evaluation of uncompressed SMV test sequences, simulating SMV displays through view sweep, which is controlled by three parameters: View-Sweep Speed (VSS), Viewing Range, and View Density (VD). In our analysis we have identified ranges of most comfortable values of VSS and VD, providing a comfortable view sweep with smooth transition between views.
3dtv-conference: the true vision - capture, transmission and display of 3d video | 2011
Pablo Carballeira; Julián Cabrera; Antonio Ortega; Fernando Jaureguizar; Narciso N. García
We present a novel framework for encoding latency analysis of arbitrary multiview video coding prediction structures. This framework avoids the need to consider an specific encoder architecture for encoding latency analysis by assuming an unlimited processing capacity on the multiview encoder. Under this assumption, only the influence of the prediction structure and the processing times have to be considered, and the encoding latency is solved systematically by means of a graph model. The results obtained with this model are valid for a multiview encoder with sufficient processing capacity and serve as a lower bound otherwise. Furthermore, with the objective of low latency encoder design with low penalty on rate-distortion performance, the graph model allows us to identify the prediction relationships that add higher encoding latency to the encoder. Experimental results for JMVM prediction structures illustrate how low latency prediction structures with a low rate-distortion penalty can be derived in a systematic manner using the new model.
Signal Processing-image Communication | 2016
Pablo Carballeira; Julián Cabrera; Fernando Jaureguizar; Narciso N. García
Aiming for 3D Video encoders with reduced computational complexity, we analyze the performance of depth-shift distortion in depth-image based rendering algorithms, incurred when coding depth maps in 3D Video, as an estimator of the distortion of synthesized views. We propose several distortion models that capture (i) the geometric distortion caused by the depth coding error, (ii) the pixel-mapping precision in view synthesis and (iii) the method to aggregate depth-shift distortion caused by the coding error in a depth block. Our analysis starts with the evaluation of the correlation between the depth-shift distortion values obtained with these models, and the actual distortion on synthesized views, with the aim of identifying the most accurate one. The correlation results show that one of the models can be used as a reasonable estimator of the synthesis distortion in low complexity depth encoders. These results also show that the Sum of Absolute Error (SAE) captures better the distortion on a depth block than the Sum of Squared Error (SSE).The correlation analysis is performed at three levels: frame, MB-row and MB. Results show that correlation values are consistently high at the frame level and for most MB-row positions, while lower values are achieved at the MB level and for specific rows at the MB-row level. Finally, to assess the results obtained by the correlation analysis, the different depth-shift distortion models are employed in two algorithms of the rate-distortion optimization (RDO) cycle of the depth encoder: (i) Quantization Parameter (QP) selection and (ii) mode decision. We evaluate the QP selection algorithm at three levels: frame, MB-row and MB, and the mode decision at the MB level. At the frame level, results show that the use of depth-shift distortion is equivalent to synthesis distortion, with the advantage of a lower computational complexity. At sub-frame levels, the results are consistent with the comparative correlation results, giving guidelines for the use of an efficient depth-shift distortion model on low complexity depth encoders. HighlightsAnalysis of the use of depth-shift distortion for low complexity 3D Video encoders.Depth-shift distortion can substitute synthesis distortion at frame or MB-row coding levels.SAE is preferable to SSE to aggregate the depth-shift distortion of a depth region.
advanced video and signal based surveillance | 2015
Ana I. Maqueda; Arturo Ruano; Carlos R. del-Blanco; Pablo Carballeira; Fernando Jaureguizar; Narciso N. García
Human-action recognition through local spatio-temporal features have been widely applied because of their simplicity and its reasonable computational complexity. The most common method to represent such features is the well-known Bag-of-Words approach, which turns a Multiple-Instance Learning problem into a supervised learning one, which can be addressed by a standard classifier. In this paper, a learning framework for human-action recognition that follows the previous strategy is presented. First, spatio-temporal features are detected. Second, they are described by HOG-HOF descriptors, and then represented by a Bag of Words approach to create a feature vector representation. The resulting high dimensional features are reduced by means of a subspace-random-projection technique that is able to retain almost all the original information. Lastly, the reduced feature vectors are delivered to a classifier called Citation K-Nearest Neighborhood, especially adapted to Multiple-Instance Learning frameworks. Excellent results have been obtained, outperforming other state-of-the art approaches in a public database.
Journal of Visual Communication and Image Representation | 2014
Pablo Carballeira; Julián Cabrera; Fernando Jaureguizar; Narciso N. García
We present a framework for the analysis of the decoding delay in multiview video coding (MVC). We show that in real-time applications, an accurate estimation of the decoding delay is essential to achieve a minimum communication latency. As opposed to single-view codecs, the complexity of the multiview prediction structure and the parallel decoding of several views requires a systematic analysis of this decoding delay, which we solve using graph theory and a model of the decoder hardware architecture. Our framework assumes a decoder implementation in general purpose multi-core processors with multi-threading capabilities. For this hardware model, we show that frame processing times depend on the computational load of the decoder and we provide an iterative algorithm to compute jointly frame processing times and decoding delay. Finally, we show that decoding delay analysis can be applied to design decoders with the objective of minimizing the communication latency of the MVC system.
international conference on consumer electronics | 2012
Pablo Carballeira; Julián Cabrera; Fernando Jaureguizar; Narciso N. García
We present a framework for the analysis of the decoding delay and communication latency in Multiview Video Coding. The application of this framework on MVC decoders allows minimizing the overall delay in immersive video-conference systems1.