Suyog Dutt Jain | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suyog Dutt Jain is active.

Explore More

Publication

Featured researches published by Suyog Dutt Jain.

european conference on computer vision | 2014

Supervoxel-Consistent Foreground Propagation in Video

Suyog Dutt Jain; Kristen Grauman

A major challenge in video segmentation is that the foreground object may move quickly in the scene at the same time its appearance and shape evolves over time. While pairwise potentials used in graph-based algorithms help smooth labels between neighboring (super)pixels in space and time, they offer only a myopic view of consistency and can be misled by inter-frame optical flow errors. We propose a higher order supervoxel label consistency potential for semi-supervised foreground segmentation. Given an initial frame with manual annotation for the foreground object, our approach propagates the foreground region through time, leveraging bottom-up supervoxels to guide its estimates towards long-range coherent regions. We validate our approach on three challenging datasets and achieve state-of-the-art results.

international conference on computer vision | 2011

Facial expression recognition with temporal modeling of shapes

Suyog Dutt Jain; Changbo Hu; Jake K. Aggarwal

Conditional Random Fields (CRFs) can be used as a discriminative approach for simultaneous sequence segmentation and frame labeling. Latent-Dynamic Conditional Random Fields (LDCRFs) incorporates hidden state variables within CRFs which model sub-structure motion patterns and dynamics between labels. Motivated by the success of LDCRFs in gesture recognition, we propose a framework for automatic facial expression recognition from continuous video sequence by modeling temporal variations within shapes using LDCRFs. We show that the proposed approach outperforms CRFs for recognizing facial expressions. Using Principal Component Analysis (PCA) we study the separability of various expression classes in lower dimension projected spaces. By comparing the performance of CRFs and LDCRFs against that of Support Vector Machines (SVMs), we demonstrate that temporal variations within shapes are crucial in classifying expressions especially for those with a small range of facial motion like anger and sadness. We also show empirically that only using changes in facial appearance over time, without using shape variations, is not sufficient to obtain high performance for facial expression recognition.

computer vision and pattern recognition | 2017

FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos

Suyog Dutt Jain; Bo Xiong; Kristen Grauman

We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate this task as a structured prediction problem and design a two-stream fully convolutional neural network which fuses together motion and appearance in a unified framework. Since large-scale video datasets with pixel level segmentations are problematic, we show how to bootstrap weakly annotated videos together with existing image recognition datasets for training. Through experiments on three challenging video segmentation benchmarks, our method substantially improves the state-of-the-art for segmenting generic (unseen) objects. Code and pre-trained models are available on the project website.

international conference on computer vision | 2013

Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation

Suyog Dutt Jain; Kristen Grauman

The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and ease-of-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images, freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas existing methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the images visual separability and foreground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.

computer vision and pattern recognition | 2016

Active Image Segmentation Propagation

Suyog Dutt Jain; Kristen Grauman

We propose a semi-automatic method to obtain foreground object masks for a large set of related images. We develop a stagewise active approach to propagation: in each stage, we actively determine the images that appear most valuable for human annotation, then revise the foreground estimates in all unlabeled images accordingly. In order to identify images that, once annotated, will propagate well to other examples, we introduce an active selection procedure that operates on the joint segmentation graph over all images. It prioritizes human intervention for those images that are uncertain and influential in the graph, while also mutually diverse. We apply our method to obtain foreground masks for over 1 million images. Our method yields state-of-the-art accuracy on the ImageNet and MIT Object Discovery datasets, and it focuses human attention more effectively than existing propagation strategies.

international symposium on communications, control and signal processing | 2012

An interactive game for teaching facial expressions to children with Autism Spectrum Disorders

Suyog Dutt Jain; Birgi Tamersoy; Yan Zhang; Jake K. Aggarwal; Verónica Orvalho

Autism Spectrum Disorders (ASDs), a neuerodevelopmental disability in children is a cause of major concern. The children with ASDs find it difficult to express and recognize emotions which makes it hard for them to interact socially. Conventional methods use medicinal means, special education and behavioral analysis. They are not always successful and are usually expensive. There is a significant need to develop technology based methods for effective intervention and cure. We propose an interactive game design which uses modern computer vision and computer graphics techniques. This game tracks facial features and uses tracked features to: 1) recognize the facial expressions of the player, and 2) animate an avatar, which mimics the players facial expressions. The ultimate goal of the game is to influence the emotional behavior of the player.

asian conference on computer vision | 2014

Which Image Pairs Will Cosegment Well? Predicting Partners for Cosegmentation

Suyog Dutt Jain; Kristen Grauman

Cosegmentation methods segment multiple related images jointly, exploiting their shared appearance to generate more robust foreground models. While existing approaches assume that an oracle will specify which pairs of images are amenable to cosegmentation, in many scenarios such external information may be difficult to obtain. This is problematic, since coupling the “wrong” images for segmentation—even images of the same object class—can actually deteriorate performance relative to single-image segmentation. Rather than manually specify partner images for cosegmentation, we propose to automatically predict which images will cosegment well together. We develop a learning-to-rank approach that identifies good partners, based on paired descriptors capturing the images’ amenability to joint segmentation. We compare our approach to alternative methods for partnering images, including basic image similarity, and show the advantages on two challenging datasets.

International Journal of Computer Vision | 2018

Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

Danna Gurari; Kun He; Bo Xiong; Jianming Zhang; Mehrnoosh Sameki; Suyog Dutt Jain; Stan Sclaroff; Margrit Betke; Kristen Grauman

We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distinguish between images which lead multiple annotators to segment different foreground objects (ambiguous) versus minor inter-annotator differences of the same object. Taking images from eight widely used datasets, we crowdsource labeling the images as “ambiguous” or “not ambiguous” to segment in order to construct a new dataset we call STATIC. Using STATIC, we develop a system that automatically predicts which images are ambiguous. Experiments demonstrate the advantage of our prediction system over existing saliency-based methods on images from vision benchmarks and images taken by blind people who are trying to recognize objects in their environment. Finally, we introduce a crowdsourcing system to achieve cost savings for collecting the diversity of all valid “ground truth” foreground object segmentations by collecting extra segmentations only when ambiguity is expected. Experiments show our system eliminates up to 47% of human effort compared to existing crowdsourcing methods with no loss in capturing the diversity of ground truths.

conference on computers and accessibility | 2018

BrowseWithMe: An Online Clothes Shopping Assistant for People with Visual Impairments

Abigale Stangl; Esha Kothari; Suyog Dutt Jain; Tom Yeh; Kristen Grauman; Danna Gurari

Our interviews with people who have visual impairments show clothes shopping is an important activity in their lives. Unfortunately, clothes shopping web sites remain largely inaccessible. We propose design recommendations to address online accessibility issues reported by visually impaired study participants and an implementation, which we call BrowseWithMe, to address these issues. BrowseWithMe employs artificial intelligence to automatically convert a product web page into a structured representation that enables a user to interactively ask the BrowseWithMe system what the user wants to learn about a product (e.g., What is the price? Can I see a magnified image of the pants?). This enables people to be active solicitors of the specific information they are seeking rather than passive listeners of unparsed information. Experiments demonstrate BrowseWithMe can make online clothes shopping more accessible and produce accurate image descriptions.

national conference on artificial intelligence | 2016