Subarna Tripathi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Subarna Tripathi is active.

Explore More

Publication

Featured researches published by Subarna Tripathi.

international soc design conference | 2015

Semantic video segmentation: Exploring inference efficiency

Subarna Tripathi; Serge J. Belongie; Youngbae Hwang; Truong Q. Nguyen

We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames. The key idea is to combine best of two worlds: semantic co-labeling and more expressive models. Our formulation enables us to perform inference over ten thousand images within seconds and makes the system amenable to perform video semantic segmentation most effectively. On CamVid dataset, with TextonBoost unaries, our proposed method achieves up to 8% improvement in accuracy over individual semantic image segmentation without additional time overhead. The source code is available at https: //github. com/subtri/video inference.

workshop on applications of computer vision | 2016

Detecting temporally consistent objects in videos through object class label propagation

Subarna Tripathi; Serge J. Belongie; Youngbae Hwang; Truong Q. Nguyen

Object proposals for detecting moving or static video objects need to address issues such as speed, memory complexity and temporal consistency. We propose an efficient Video Object Proposal (VOP) generation method and show its efficacy in learning a better video object detector A deep-learning based video object detector learned using the proposed VOP achieves state-of-the-art detection performance on the Youtube-Objects dataset. We further propose a clustering of VOPs which can efficiently be used for detecting objects in video in a streaming fashion. As opposed to applying per-frame convolutional neural network (CNN) based object detection, our proposed method called Objects in Video Enabler thRough LAbel Propagation (OVERLAP) needs to classify only a small fraction of all candidate proposals in every video frame through streaming clustering of object proposals and class-label propagation. Source code for VOP clustering is available at https://github. com/subtri/streaming_VOP_clustering.

british machine vision conference | 2016

Context Matters: Refining Object Detection in Video with Recurrent Neural Networks.

Subarna Tripathi; Zachary C. Lipton; Serge J. Belongie; Truong Q. Nguyen

Given the vast amounts of video available online, and recent breakthroughs in object detection with static images, object detection in video offers a promising new frontier. However, motion blur and compression artifacts cause substantial frame-level variability, even in videos that appear smooth to the eye. Additionally, video datasets tend to have sparsely annotated frames. We present a new framework for improving object detection in videos that captures temporal context and encourages consistency of predictions. First, we train a pseudo-labeler, that is, a domain-adapted convolutional neural network for object detection. The pseudo-labeler is first trained individually on the subset of labeled frames, and then subsequently applied to all frames. Then we train a recurrent neural network that takes as input sequences of pseudo-labeled frames and optimizes an objective that encourages both accuracy on the target frame and consistency across consecutive frames. The approach incorporates strong supervision of target frames, weak-supervision on context frames, and regularization via a smoothness penalty. Our approach achieves mean Average Precision (mAP) of 68.73, an improvement of 7.1 over the strongest image-based baselines for the Youtube-Video Objects dataset. Our experiments demonstrate that neighboring frames can provide valuable information, even absent labels.

asian conference on pattern recognition | 2015

Real-time sign language fingerspelling recognition using convolutional neural networks from depth map

Byeongkeun Kang; Subarna Tripathi; Truong Q. Nguyen

Sign language recognition is important for natural and convenient communication between deaf community and hearing majority. We take the highly efficient initial step of automatic fingerspelling recognition system using convolutional neural networks (CNNs) from depth maps. In this work, we consider relatively larger number of classes compared with the previous literature. We train CNNs for the classification of 31 alphabets and numbers using a subset of collected depth data from multiple subjects. While using different learning configurations, such as hyper-parameter selection with and without validation, we achieve 99.99% accuracy for observed signers and 83.58% to 85.49% accuracy for new signers. The result shows that accuracy improves as we include more data from different subjects during training. The processing time is 3 ms for the prediction of a single image. To the best of our knowledge, the system achieves the highest accuracy and speed. The trained model and dataset is available on our repository1.

international conference on advances in pattern recognition | 2009

Online Improved Eigen Tracking

Subarna Tripathi; Santanu Chaudhury; Sumantra Dutta Roy

We present a novel predictive statistical framework to improve the performance of an Eigen Tracker which uses fast and efficient eigen space updates to learn new views of the object being tracked on the fly using candid co-variance free incremental PCA. The proposed system detects and tracks an object in the scene by learning the appearance model of the object online motivated by non-traditional uniform norm. It speeds up the tracker many fold by avoiding nonlinear optimization generally used in the literature.

workshop on applications of computer vision | 2017

A Statistical Approach to Continuous Self-Calibrating Eye Gaze Tracking for Head-Mounted Virtual Reality Systems

Subarna Tripathi; Brian K. Guenter

We present a novel, automatic eye gaze tracking scheme inspired by smooth pursuit eye motion while playing mobile games or watching virtual reality contents. Our algorithm continuously calibrates an eye tracking system for a head mounted display. This eliminates the need for an explicit calibration step and automatically compensates for small movements of the headset with respect to the head. The algorithm finds correspondences between corneal motion and screen space motion, and uses these to generate Gaussian Process Regression models. A combination of those models provides a continuous mapping from corneal position to screen space position. Accuracy is nearly as good as achieved with an explicit calibration step.

workshop on applications of computer vision | 2014

Improving streaming video segmentation with early and mid-level visual processing

Subarna Tripathi; Youngbae Hwang; Serge J. Belongie; Truong Q. Nguyen

Despite recent advances in video segmentation, many opportunities remain to improve it using a variety of low and mid-level visual cues. We propose improvements to the leading streaming graph-based hierarchical video segmentation (streamGBH) method based on early and mid level visual processing. The extensive experimental analysis of our approach validates the improvement of hierarchical supervoxel representation by incorporating motion and color with effective filtering. We also pose and illuminate some open questions towards intermediate level video analysis as further extension to streamGBH. We exploit the supervoxels as an initialization towards estimation of dominant affine motion regions, followed by merging of such motion regions in order to hierarchically segment a video in a novel motion-segmentation framework which aims at subsequent applications such as foreground recognition.

international conference on pattern recognition | 2008

Parametric video compression using appearance space

Santanu Chaudhury; Subarna Tripathi; Sumantra Dutta Roy

The novelty of the approach presented in this paper is the unique object-based video coding framework for videos obtained from a static camera. As opposed to most existing methods, the proposed method does not require explicit 2D or 3D models of objects and hence is general enough to satisfy the need for varying types of objects in the scene. The proposed system detects and tracks an object in the scene by learning the appearance model of each object online using nontraditional uniform norm based subspace. At the same time the object is coded using the projection coefficients to the orthonormal basis of the subspace learnt. The tracker incorporates a predictive framework based upon Kalman filter for predicting the five motion parameters. The proposed method shows substantially better compression than MPEG2 based coding with almost no additional complexity.

international symposium on broadband multimedia systems and broadcasting | 2008

A scene change independent high quality constant bit rate control algorithm for MPEG4 simple profile transcoding

Subarna Tripathi; Emiliano Piccinelli

Video transcoding is becoming nowadays more and more popular, due to the necessity to cope with several different encoding standards, available bandwidth and decoder capabilities. In this paper we are going to discuss the strategy for implementation of a buffer-based rate control algorithm (constant bit rate) in a MPEG2 to MPEG4 Simple profile transcoder which is over performing during scene changes, always preserving the so called VBV compliancy. This technique is extendable to the transcoding to any video standard having onlypredicted (P) frames.

international conference on signal processing | 2007

Pre-Coded Video Transcoding into H.264 with Arbitrary Resolution Change

Subarna Tripathi; Kaushik Saha; Emiliano Piccinelli

This paper describes a new transcoding algorithm, able to transcode any coded (e.g. MPEG-2) bit-stream into an H.264 sequence with arbitrary spatial resolution change. The visual quality at a given input and output bit-rate is close or equal to full decoding followed by a full encoding (0.5 dB-to-2 dB less in PSNR than that of re-encoding the stream at the target resolution) while, from the complexity point of view, the proposed transcoding approach is at least ten times faster than re-encoding. The experimental results show that this H.264 transcoder always gives about 20 to 60% better compression than that of the size of the original MPEG2 sequence scaled by the target resolution ratio at equal subjective quality. Using constant quantization parameters both in the transcoding and re-encoding, transcoding gives 20 to 40% lesser compression than re-encoding.

Explore More