Aditya Mavlankar
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aditya Mavlankar.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Markus Flierl; Aditya Mavlankar; Bernd Girod
We investigate the rate-distortion efficiency of motion and disparity compensated coding for multiview video. Disparity compensation exploits the correlation among the view sequences and motion compensation makes use of the temporal correlation within each view sequence. We define a matrix of pictures with N view sequences, each with K temporally successive pictures. For experimental coding purposes, a scheme based on H.264/AVC is devised. We assess the overall rate-distortion efficiency for matrices of pictures of various dimensions (N, K). Moreover, we discuss the impact of inaccurate disparity compensation within a matrix of pictures. Finally, we propose and discuss a theoretical model for multiview video coding that explains our experimental observations. Performance bounds are presented for high rates.
2010 18th International Packet Video Workshop | 2010
Aditya Mavlankar; Piyush Agrawal; Derek Pang; Sherif A. Halawa; Ngai-Man Cheung; Bernd Girod
ClassX is an interactive online lecture viewing system developed at Stanford University. Unlike existing solutions that restrict the user to watch only a pre-defined view, ClassX allows interactive pan/tilt/zoom while watching the video. The interactive video streaming paradigm avoids sending the entire field-of-view in the recorded high resolution, thus reducing the required data rate. To alleviate the navigation burden on the part of the online viewer, ClassX offers automatic tracking of the lecturer. ClassX also employs slide recognition technology, which allows automatic synchronization of digital presentation slides with those appearing in the lecture video. This paper presents a design overview of the ClassX system and the evaluation results of a 3-month pilot deployment at Stanford University. The results demonstrate that our system is a low-cost, efficient and pragmatic solution to interactive online lecture viewing.
Packet Video 2007 | 2007
Aditya Mavlankar; David P. Varodayan; Bernd Girod
This paper investigates region-of-interest (ROI) prediction strategies for a client-server system that interactively streams regions of high resolution video. ROI prediction enables pro-active pre-fetching of select slices of encoded video from the server to allow low latency of interaction despite the delay of packets on the network. The client has a buffer of low resolution overview video frames available. We propose and study ROI prediction schemes that can take advantage of the motion information contained in these buffered frames. The system operates in two modes. In the manual mode, the user interacts actively to view select regions in each frame of video. The ROI prediction in this mode aims to reduce the distortion experienced by the viewer in his desired ROI. In the tracking mode, the user simply indicates an object to track and the system supplies an ROI trajectory without further interaction. For this mode, the prediction aims to create a smooth and stable trajectory that satisfies the user’s expectation of tracking. While the motion information enables the tracking mode, it also improves the ROI prediction in the manual mode.
data compression conference | 2007
David P. Varodayan; Aditya Mavlankar; Markus Flierl; Bernd Girod
Distributed compression is particularly attractive for stereo images since it avoids communication between cameras. Since compression performance depends on exploiting the redundancy between images, knowing the disparity is important at the decoder. Unfortunately, distributed encoders cannot calculate this disparity and communicate it. We consider the compression of grayscale stereo images, and develop an expectation maximization algorithm to perform unsupervised learning of disparity during the decoding procedure. Towards this, we devise a novel method for joint bitplane distributed source coding of grayscale images. Our experiments with both natural and synthetic 8-bit images show that the unsupervised disparity learning algorithm outperforms a system which does no disparity compensation by between 1 and more than 3 bits/pixel and performs nearly as well as a system which knows the disparity through an oracle
international workshop on quality of service | 2008
Sachin Agarwal; Jatinder Pal Singh; Aditya Mavlankar; Pierpaolo Baccichet; Bernd Girod
We evaluate the performance of a large-scale live P2P video multicast session comprising more than 120, 000 peers on the Internet. Our analysis highlights P2P video multicast characteristics such as high bandwidth requirements, high peer churn, low peer persistence in the P2P multicast system, significant variance in the media stream quality delivered to peers, relatively large channel start times, and flash crowd effects of popular video content. Our analysis also indicates that peers are widely spread across the IP address space, spanning dozens of countries and hundreds of ISPs and Internet ASes. As part of the P2P multicast evaluation several QoS measures such as fraction of stream blocks correctly received, number of consecutive stream blocks lost, and channel startup time across peers. We correlate the observed quality with the underlying network and with peer behavior, suggesting several avenues for optimization and research in P2P video multicast systems.
international conference on image processing | 2008
Aditya Mavlankar; Jeonghun Noh; Pierpaolo Baccichet; Bernd Girod
Video streaming with virtual pan/tilt/zoom functionality allows the viewer to watch arbitrary regions of a high-spatial-resolution scene. In our proposed system, the user controls his region-of-interest (ROI) interactively during the streaming session. The relevant portion of the scene is rendered on his screen immediately. An additional thumbnail overview aids his navigation. We design a peer-to-peer (P2P) multicast live video streaming system to provide the control of interactive region-of-interest (IROI) to large populations of viewers while exploiting the overlap of ROIs for efficient and scalable delivery. Our P2P overlay is altered on-the-fly in a distributed manner with the changing ROIs of the peers. The main challenges for such a system are posed by the stringent latency constraint, the churn in the ROIs of peers and the limited bandwidth at the server hosting the IROI video session. Experimental results with a network simulator indicate that the delivered quality is close to that of an alternative traditional unicast client-server delivery mechanism yet requiring less uplink capacity at the server.
picture coding symposium | 2009
Aditya Mavlankar; Bernd Girod
Video streaming with virtual pan/tilt/zoom functionality allows the viewer to watch arbitrary regions of a high-spatial-resolution scene. A video coding scheme with random access to arbitrary regions of arbitrary zoom factors helps avoid transmission and/or decoding of the entire high-spatial-resolution video signal. The video coding scheme, proposed in our earlier work, creates a multi-resolution representation and uses P slices of H.264/AVC. The base layer, which provides a thumbnail overview of the entire scene, is encoded using motion-compensated prediction (MCP) among temporally successive frames. To provide efficient random access, we avoid MCP among temporally successive frames of the higher resolution layers. Instead, upward prediction from the reconstructed thumbnail frames is used for coding high-resolution P slices. In this paper, we show that background extraction and long-term memory motion-compensated prediction can reduce the bitrate by up to 85% while retaining efficient random access capability.
Archive | 2010
Aditya Mavlankar; Bernd Girod
High-spatial-resolution videos offer the possibility of viewing an arbitrary region-of-interest (RoI) interactively. The user can pan/tilt/zoom while watching the video. This chapter presents spatial-random-access-enabled video compression that encodes the content such that arbitrary RoIs corresponding to different zoom factors can be extracted from the compressed bit-stream. The chapter also covers RoI trajectory prediction, which allows pre-fetching relevant content in a streaming scenario. The more accurate the prediction the lower is the percentage of missing pixels. RoI prediction techniques can perform better by adapting according to the video content in addition to simply extrapolating previous moves of the input device. Finally, the chapter presents a streaming system that employs application-layer peer-to-peer (P2P) multicast while still allowing the users to freely choose individual RoIs. The P2P overlay adapts on-the-fly for exploiting the commonalities in the peers’ RoIs. This enables peers to relay data to each other in real-time, thus drastically reducing the bandwidth required from dedicated servers.
picture coding symposium | 2009
Jeonghun Noh; Pierpaolo Baccichet; Frank Hartung; Aditya Mavlankar; Bernd Girod
We review the Stanford Peer-to-Peer Multicast (SPPM) protocol for live video streaming and report recent extensions. SPPM has been designed for low latency and robust transmission of live media by organizing peers within multiple complementary trees. The recent extensions to live streaming are time-shifted streaming, interactive region-of-interest (IRoI) streaming, and streaming to mobile devices. With time-shifting, users can choose an arbitrary beginning point for watching a stream, whereas IRoI streaming allows users to select an arbitrary region to watch within a high-spatial-resolution scene. We extend the live streaming to mobile devices by addressing challenges due to heterogeneous displays, connection speeds, and decoding capabilities.
IEEE Transactions on Circuits and Systems for Video Technology | 2011
Aditya Mavlankar; Bernd Girod
High-spatial-resolution videos offer the possibility of viewing an arbitrary region-of-interest (RoI) interactively. Zoom functionality enables watching high-resolution content even on displays of lower spatial resolution. If arbitrary regions corresponding to arbitrary zoom factors can be served to the user, the transmission and/or decoding of the entire high-spatial-resolution video can be avoided. Moreover, if the video content can be encoded such that arbitrary RoIs corresponding to different zoom factors can be simply extracted from the compressed bitstream, we can avoid dedicated video encoding for each user. We propose such a video coding scheme that is vital in allowing the system to scale to large numbers of remote users as well as to encode and store the content for subsequent repeated playback. Apart from generating a multi-resolution representation, our coding scheme uses P slices from H.264/AVC. We study the tradeoff in the choice of slice size. A larger slice size enables higher coding efficiency for representing the entire scene but increases the number of pixels that have to be transmitted. The optimal slice size achieves the best tradeoff and minimizes the expected transmission bitrate. Experimental results confirm the optimality of our predicted slice size for various test cases. Furthermore, we propose an improvement based on background extraction and long-term memory motion-compensated prediction. Experiments indicate up to 85% bitrate reduction while retaining efficient random access capability.