Abir Das
University of California, Riverside
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abir Das.
european conference on computer vision | 2014
Abir Das; Anirban Chakraborty; Amit K. Roy-Chowdhury
Most existing person re-identification methods focus on finding similarities between persons between pairs of cameras (camera pairwise re-identification) without explicitly maintaining consistency of the results across the network. This may lead to infeasible associations when results from different camera pairs are combined. In this paper, we propose a network consistent re-identification (NCR) framework, which is formulated as an optimization problem that not only maintains consistency in re-identification results across the network, but also improves the camera pairwise re-identification performance between all the individual camera pairs. This can be solved as a binary integer programing problem, leading to a globally optimal solution. We also extend the proposed approach to the more general case where all persons may not be present in every camera. Using two benchmark datasets, we validate our approach and compare against state-of-the-art methods.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015
Niki Martinel; Abir Das; Christian Micheloni; Amit K. Roy-Chowdhury
Person re-identification in a non-overlapping multicamera scenario is an open challenge in computer vision because of the large changes in appearances caused by variations in viewing angle, lighting, background clutter, and occlusion over multiple cameras. As a result of these variations, features describing the same person get transformed between cameras. To model the transformation of features, the feature space is nonlinearly warped to get the “warp functions”. The warp functions between two instances of the same target form the set of feasible warp functions while those between instances of different targets form the set of infeasible warp functions. In this work, we build upon the observation that feature transformations between cameras lie in a nonlinear function space of all possible feature transformations. The space consisting of all the feasible and infeasible warp functions is the warp function space (WFS). We propose to learn a discriminating surface separating these two sets of warp functions in the WFS and to re-identify persons by classifying a test warp function as feasible or infeasible. Towards this objective, a Random Forest (RF) classifier is employed which effectively chooses the warp function components according to their importance in separating the feasible and the infeasible warp functions in the WFS. Extensive experiments on five datasets are carried out to show the superior performance of the proposed approach over state-of-the-art person re-identification methods. We show that our approach outperforms all other methods when large illumination variations are considered. At the same time it has been shown that our method reaches the best average performance over multiple combinations of the datasets, thus, showing that our method is not designed only to address a specific challenge posed by a particular dataset.
european conference on computer vision | 2016
Niki Martinel; Abir Das; Christian Micheloni; Amit K. Roy-Chowdhury
Person re-identification is an open and challenging problem in computer vision. Majority of the efforts have been spent either to design the best feature representation or to learn the optimal matching metric. Most approaches have neglected the problem of adapting the selected features or the learned model over time. To address such a problem, we propose a temporal model adaptation scheme with human in the loop. We first introduce a similarity-dissimilarity learning method which can be trained in an incremental fashion by means of a stochastic alternating directions methods of multipliers optimization procedure. Then, to achieve temporal adaptation with limited human effort, we exploit a graph-based approach to present the user only the most informative probe-gallery matches that should be used to update the model. Results on three datasets have shown that our approach performs on par or even better than state-of-the-art approaches while reducing the manual pairwise labeling effort by about \(80\,\%\).
computer vision and pattern recognition | 2017
Vasili Ramanishka; Abir Das; Jianming Zhang; Kate Saenko
Neural image/video captioning models can generate accurate descriptions, but their internal process of mapping regions to words is a black box and therefore difficult to explain. Top-down neural saliency methods can find important regions given a high-level semantic task such as object classification, but cannot use a natural language sentence as the top-down input for the task. In this paper, we propose Caption-Guided Visual Saliency to expose the region-to-word mapping in modern encoder-decoder networks and demonstrate that it is learned implicitly from caption training data, without any pixel-level annotations. Our approach can produce spatial or spatiotemporal heatmaps for both predicted captions, and for arbitrary query sentences. It recovers saliency without the overhead of introducing explicit attention layers, and can be used to analyze a variety of existing model architectures and improve their design. Evaluation on large-scale video and image datasets demonstrates that our approach achieves comparable captioning performance with existing methods while providing more accurate saliency heatmaps. Our code is available at visionlearninggroup.github.io/caption-guided-saliency/.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016
Anirban Chakraborty; Abir Das; Amit K. Roy-Chowdhury
Existing data association techniques mostly focus on matching pairs of data-point sets and then repeating this process along space-time to achieve long term correspondences. However, in many problems such as person re-identification, a set of data-points may be observed at multiple spatio-temporal locations and/or by multiple agents in a network and simply combining the local pairwise association results between sets of data-points often leads to inconsistencies over the global space-time horizons. In this paper, we propose a Novel Network Consistent Data Association (NCDA) framework formulated as an optimization problem that not only maintains consistency in association results across the network, but also improves the pairwise data association accuracies. The proposed NCDA can be solved as a binary integer program leading to a globally optimal solution and is capable of handling the challenging data-association scenario where the number of data-points varies across different sets of instances in the network. We also present an online implementation of NCDA method that can dynamically associate new observations to already observed data-points in an iterative fashion, while maintaining network consistency. We have tested both the batch and the online NCDA in two application areas - person re-identification and spatio-temporal cell tracking and observed consistent and highly accurate data association results in all the cases.
international conference on image processing | 2016
Rameswar Panda; Abir Das; Amit K. Roy-Chowdhury
Most traditional video summarization methods are designed to generate effective summaries for single-view videos, and thus they cannot fully exploit the complicated intra- and inter-view correlations in summarizing multi-view videos. In this paper, we introduce a novel framework for summarizing multi-view videos in a way that takes into consideration both intra- and inter-view correlations in a joint embedding space. We learn the embedding by minimizing an objective function that has two terms: one due to intra-view correlations and another due to inter-view correlations across the multiple views. The solution is obtained by using a Majorization-Minimization algorithm that monotonically decreases the cost function in each iteration. We then employ a sparse representative selection approach over the learned embedding space to summarize the multi-view videos. Experiments on several multi-view datasets demonstrate that the proposed approach clearly outperforms the state-of-the-art methods.
computer vision and pattern recognition | 2013
Shu Zhang; Abir Das; Chong Ding; Amit K. Roy-Chowdhury
People are often seen together. We use this simple observation to provide crucial additional information and increase the robustness of a video tracker. The goal of this paper is to show how, in situations where offline training data is not available, a social behavior model (SBM) can be inferred online and then integrated within the tracking algorithm. We start with tracklets (short term confident tracks) obtained using an existing tracker. The SBM, a graphical model, captures the spatio-temporal relationships between the tracklets and is learned online from the video. The final probability of association between the tracklets is obtained by a combination of individual target characteristics (e.g., their appearance), as well as the learned relationship model between them. The entire system is causal whereby the results at any given time depend only upon the part of the video already observed. Experimental results on three state-of-the-art datasets show that, without having access to any offline training data or the entire test video a priori (conditions that may be restrictive for many application domains), our proposed method obtains results similar to those that do impose the above conditions.
Computer Vision and Image Understanding | 2017
Abir Das; Rameswar Panda; Amit K. Roy-Chowdhury
The problem of image-base person identification/recognition is to provide an identity to the image of an individual based on learned models that describe his/her appearance. Most traditional person identification systems rely on learning a static model on tediously labeled training data. Though labeling manually is an indispensable part of a supervised framework, for a large scale identification system labeling huge amount of data is a significant overhead. For large multi-sensor data as typically encountered in camera networks, labeling a lot of samples does not always mean more information, as redundant images are labeled several times. In this work, we propose a convex optimization based iterative framework that progressively and judiciously chooses a sparse but informative set of samples for labeling, with minimal overlap with previously labeled images. We also use a structure preserving sparse reconstruction based classifier to reduce the training burden typically seen in discriminative classifiers. The two stage approach leads to a novel framework for online update of the classifiers involving only the incorporation of new labeled data rather than any expensive training phase. We demonstrate the effectiveness of our approach on multi-camera person re-identification datasets, to demonstrate the feasibility of learning online classification models in multi-camera big data applications. Using three benchmark datasets, we validate our approach and demonstrate that our framework achieves superior performance with significantly less amount of manual labeling.
international conference on computer vision | 2017
Huijuan Xu; Abir Das; Kate Saenko
acm multimedia | 2016
Vasili Ramanishka; Abir Das; Dong Huk Park; Subhashini Venugopalan; Lisa Anne Hendricks; Marcus Rohrbach; Kate Saenko