Samarjit Das | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Samarjit Das is active.

Explore More

Publication

Featured researches published by Samarjit Das.

IEEE Transactions on Image Processing | 2012

Particle Filter With a Mode Tracker for Visual Tracking Across Illumination Changes

Samarjit Das; Amit A. Kale; Namrata Vaswani

In this correspondence, our goal is to develop a visual tracking algorithm that is able to track moving objects in the presence of illumination variations in the scene and that is robust to occlusions. We treat the illumination and motion (x-y translation and scale) parameters as the unknown “state” sequence. The observation is the entire image, and the observation model allows for occasional occlusions (modeled as outliers). The nonlinearity and multimodality of the observation model necessitate the use of a particle filter (PF). Due to the inclusion of illumination parameters, the state dimension increases, thus making regular PFs impractically expensive. We show that the recently proposed approach using a PF with a mode tracker can be used here since, even in most occlusion cases, the posterior of illumination conditioned on motion and the previous state is unimodal and quite narrow. The key idea is to importance sample on the motion states while approximating importance sampling by posterior mode tracking for estimating illumination. Experiments demonstrate the advantage of the proposed algorithm over existing PF-based approaches for various face and vehicle tracking. We are also able to detect illumination model changes, e.g., those due to transition from shadow to sunlight or vice versa by using the generalized expected log-likelihood statistics and successfully compensate for it without ever loosing track.

international conference on acoustics, speech, and signal processing | 2017

Very deep convolutional neural networks for raw waveforms

Wei Dai; Chia Dai; Shuhui Qu; Juncheng Li; Samarjit Das

Learning acoustic models directly from the raw waveform data with minimal processing is challenging. Current waveform-based models have generally used very few (∼2) convolutional layers, which might be insufficient for building high-level discriminative features. In this work, we propose very deep convolutional neural networks (CNNs) that directly use time-domain waveforms as inputs. Our CNNs, with up to 34 weight layers, are efficient to optimize over very long sequences (e.g., vector of size 32000), necessary for processing acoustic waveforms. This is achieved through batch normalization, residual learning, and a careful design of down-sampling in the initial layers. Our networks are fully convolutional, without the use of fully connected layers and dropout, to maximize representation learning. We use a large receptive field in the first convolutional layer to mimic bandpass filters, but very small receptive fields subsequently to control the model capacity. We demonstrate the performance gains with the deeper models. Our evaluation shows that the CNN with 18 weight layers outperforms the CNN with 3 weight layers by over 15% in absolute accuracy for an environmental sound recognition task and is competitive with the performance of models using log-mel features.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

Nonstationary Shape Activities: Dynamic Models for Landmark Shape Change and Applications

Samarjit Das; Namrata Vaswani

Our goal is to develop statistical models for the shape change of a configuration of ¿landmark¿ points (key points of interest) over time and to use these models for filtering and tracking to automatically extract landmarks, synthesis, and change detection. The term ¿shape activity¿ was introduced in recent work to denote a particular stochastic model for the dynamics of landmark shapes (dynamics after global translation, scale, and rotation effects are normalized for). In that work, only models for stationary shape sequences were proposed. But most ¿activities¿ of a set of landmarks, e.g., running, jumping, or crawling, have large shape changes with respect to initial shape and hence are nonstationary. The key contribution of this work is a novel approach to define a generative model for both 2D and 3D nonstationary landmark shape sequences. Greatly improved performance using the proposed models is demonstrated for sequentially filtering noise-corrupted landmark configurations to compute Minimum Mean Procrustes Square Error (MMPSE) estimates of the true shape and for tracking human activity videos, i.e., for using the filtering to predict the locations of the landmarks (body parts) and using this prediction for faster and more accurate landmarks extraction from the current image.

asilomar conference on signals, systems and computers | 2010

Particle Filtered Modified Compressive Sensing (PaFiMoCS) for tracking signal sequences

Samarjit Das; Namrata Vaswani

In this paper, we propose a novel algorithm for recursive reconstruction of a time sequence of sparse signals from highly under-sampled random linear measurements. In our method, the idea of recently proposed regularized modified compressive sensing (reg-mod-CS) is merged with sequential monte carlo techniques like particle filtering. Reg-mod-CS facilitates sequential reconstruction by utilizing a partial knowledge of the support and previous signal estimates. Under the assumption of a dynamical model on the support, the sequential Monte Carlo step renders various possibilities of the current support and choose the most likely support given the current observation. The algorithm is similar in sprit to particle filter with mode tracker (PF-MT), where, the support could be considered to be the effective basis whereas the signal values on the current support could be considered to be the residual space; the difference being the fact that in our algorithm, the mode tracking step is replaced by the reg-mod-CS for each particle and the support is re-estimated from the residual part. We compare our algorithm with other techniques like traditional particle filtering, particle filter with mode tracker (PFMT), static compressive sensing, modified compressive sensing, regularized modified compressive sensing and weighted ℓ1. We demonstrate that our algorithm outperforms all the other methods in terms of reconstruction accuracy for a simulated sequence of sparse signals.

international conference on acoustics, speech, and signal processing | 2010

Hiding information inside structured shapes

Samarjit Das; Shantanu Rane; Anthony Vetro

This paper describes a new technique for embedding a message within structured shapes. It is desired that any changes in the shape owing to the embedded message are invisible to a casual observer but detectable by a specialized decoder. The message embedding algorithm represents shape outlines as a set of cubic Bezier curves and straight line segments. By slightly perturbing the Bezier curves, a single shape can spawn a library of similar-looking shapes each corresponding to a unique message. This library is efficiently stored using Adaptively Sampled Distance Fields (ADFs) which also facilitate rendering of the modified shapes at the desired resolution and fidelity. Given any modified shape, a forensic detector applies Procrustes analysis to determine the embedded message. Results of an extensive subjective test confirm that the shape modifications are indeed unobtrusive. Further, to test the recovery of the message bits in noisy physical environments, a text document is put through a print-photocopy-scan process. Message recovery is found to be stable even after multiple rounds of photocopying.

Computer Vision and Image Understanding | 2017

Improved scene identification and object detection on egocentric vision of daily activities

Gonzalo Vaca-Castano; Samarjit Das; Joao P. Sousa; Niels da Vitoria Lobo; Mubarak Shah

Abstract This work investigates the relationship between scene and associated objects on daily activities under egocentric vision constraints. Daily activities are performed in prototypical scenes that share a lot of visual appearances independent of where or by whom the video was recorded. The intrinsic characteristics of egocentric vision suggest that the location where the activity is conducted remains consistent throughout frames. This paper shows that egocentric scene identification is improved by taking the temporal context into consideration. Moreover, since most of the objects are typically associated with particular types of scenes, we show that a generic object detection method can also be improved by re-scoring the results of the object detection method according to the scene content. We first show the case where the scene identity is explicitly predicted to improve object detection, and then we show a framework using Long Short-Term Memory (LSTM) where no labeling of the scene type is needed. We performed experiments in the Activities of Daily Living (ADL) public dataset (Pirsiavash and Ramanan,2012), which is a standard benchmark for egocentric vision.

international conference on acoustics, speech, and signal processing | 2013

Tracking sparse signal sequences from nonlinear/non-Gaussian measurements and applications in illumination-motion tracking

R. Sarkar; Samarjit Das; Namrata Vaswani

In this work, we develop algorithms for tracking time sequences of sparse spatial signals with slowly changing sparsity patterns, and other unknown states, from a sequence of nonlinear observations corrupted by (possibly) non-Gaussian noise. A key example of the above problem occurs in tracking moving objects across spatially varying illumination changes, where motion is the small dimensional state while the illumination image is the sparse spatial signal satisfying the slow-sparsity-pattern-change property.

international conference on acoustics, speech, and signal processing | 2017

A comparison of Deep Learning methods for environmental sound detection

Juncheng Li; Wei Dai; Florian Metze; Shuhui Qu; Samarjit Das

Environmental sound detection is a challenging application of machine learning because of the noisy nature of the signal, and the small amount of (labeled) data that is typically available. This work thus presents a comparison of several state-of-the-art Deep Learning models on the IEEE challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge task and data, classifying sounds into one of fifteen common indoor and outdoor acoustic scenes, such as bus, cafe, car, city center, forest path, library, train, etc. In total, 13 hours of stereo audio recordings are available, making this one of the largest datasets available.

international conference on image processing | 2015

Improving egocentric vision of daily activities.

Gonzalo Vaca-Castano; Samarjit Das; Joao P. Sousa

In this paper, we investigates the interplay between scene and objects on daily activities under egocentric vision constraints. The nature of egocentric vision implies that the identity of the current scene remains consistent for several frames. We showed that this constraint can be used to improve several scene identification baselines including the current state of the art scene identification method. We also show that the scene identity can be used to improve the object detection. In generic object detection, models for objects typically only considers local context, ignoring the global scene context; however in daily activities, objects are typically associated to particular types of scenes. We exploited this context clue to re-score the object detectors. Re-scoring function is learned from scene classifiers and object detectors in a validation set. In testing time, models of objects are weighted according to the scene identity score (context) of the tested frame, improving the object detection as measured by mAP, respect to object detectors without the scene identity clue. Our experiments were performed in the Activities of Daily Living (ADL) public dataset [1] which is a standard benchmark for egocentric vision.

asilomar conference on signals, systems and computers | 2007