Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Niluthpol Chowdhury Mithun is active.

Publication


Featured researches published by Niluthpol Chowdhury Mithun.


IEEE Transactions on Intelligent Transportation Systems | 2012

Detection and Classification of Vehicles From Video Using Multiple Time-Spatial Images

Niluthpol Chowdhury Mithun; Nafi Ur Rashid; S. M. Mahbubur Rahman

Detection and classification of vehicles are two of the most challenging tasks of a video-based intelligent transportation system. Traditional detection and classification methods are computationally highly expensive and become unsuccessful in many cases such as occlusion among the vehicles and when differences between pixel intensities of vehicles and backgrounds are small. In this paper, a novel detection and classification method is proposed using multiple time-spatial images (TSIs), each obtained from a virtual detection line on the frames of a video. Such a use of multiple TSIs provides the opportunity to identify the latent occlusions among the vehicles and to reduce the dependencies of the pixel intensities between the still and moving objects to increase the accuracy of detection performance as well as to achieve an improved classification performance. In order to identify the class of a particular vehicle, a two-step k nearest neighborhood classification scheme is proposed by utilizing the shape-based, shape-invariant, and texture-based features of the segmented regions corresponding to the vehicle appeared in appropriate frames that are determined from the TSIs of the video. Extensive experimentations are carried out in vehicular traffics of varying environments to evaluate the detection and classification performance of the proposed method, as compared with the existing methods. Experimental results demonstrate that the proposed method provides a significant improvement in counting and classifying the vehicles in terms of accuracy and robustness alongside a substantial reduction of execution time, as compared with that of the other methods.


international conference on electrical and control engineering | 2010

Detection and classification of vehicles from a video using time-spatial image

Nafi Ur Rashid; Niluthpol Chowdhury Mithun; Bhadhan Roy Joy; S. M. Mahbubur Rahman

Detection and classification of vehicles are the most challenging tasks of a video-based intelligent transportation system. Traditional detection and classification methods are based on subtraction of estimated still backgrounds from a video to find out the moving objects. In general, these methods are computationally highly expensive, and in many cases show poor detection and classification performance, especially when differences between pixel intensities of vehicles and backgrounds are small. In this paper, we present a novel detection and classification method that employs an analysis of time-spatial image (TSI) obtained from a virtual line on the frames of a video so that the dependencies of pixel intensities of still and moving objects of the video may be reduced. First, the TSI is segmented to count the number of vehicles those cross the virtual line. Then, a feature-based classification scheme is proposed to classify these vehicles. The classification scheme utilizes the shape of the segmented regions of the TSI as well as that of appropriate frames of a video to extract the ceratin features of the moving objects. Experimental results on a number of real video sequences demonstrate that the proposed method provides higher accuracy in counting and classifying the vehicles as compared to that of the conventional background subtraction-based methods.


acm multimedia | 2016

Generating Diverse Image Datasets with Limited Labeling

Niluthpol Chowdhury Mithun; Rameswar Panda; Amit K. Roy-Chowdhury

Image datasets play a pivotal role in advancing multimedia and image analysis research. However, most of these datasets are created by extensive human effort and extremely expensive to scale up. There is high chance that we may have no instances for some required concepts in these data-sets or the available instances do not cover the diversity of real-world scenarios. In this regard, several approaches for learning from web images and refining them have been proposed, but these approaches either include significant redundant instances in the dataset or fail to guarantee a diverse enough set to train a robust classifier. In this work, we propose a semi-supervised sparse coding framework to collect a diverse set of images with minimal human effort, which can be used to both create a dataset from scratch or enrich an existing dataset with diverse examples. To evaluate our method, we constructed an image dataset with our framework, which is named as DivNet. Experiments on this dataset demonstrate that our method not only reduces manual effort, but also the created dataset has excellent accuracy, diversity and cross-dataset generalization ability.


IEEE Transactions on Image Processing | 2017

Diversity-Aware Multi-Video Summarization

Rameswar Panda; Niluthpol Chowdhury Mithun; Amit K. Roy-Chowdhury

Most video summarization approaches have focused on extracting a summary from a single video; we propose an unsupervised framework for summarizing a collection of videos. We observe that each video in the collection may contain some information that other videos do not have, and thus exploring the underlying complementarity could be beneficial in creating a diverse informative summary. We develop a novel diversity-aware sparse optimization method for multi-video summarization by exploring the complementarity within the videos. Our approach extracts a multi-video summary, which is both interesting and representative in describing the whole video collection. To efficiently solve our optimization problem, we develop an alternating minimization algorithm that minimizes the overall objective function with respect to one video at a time while fixing the other videos. Moreover, we introduce a new benchmark data set, Tour20, that contains 140 videos with multiple manually created summaries, which were acquired in a controlled experiment. Finally, by extensive experiments on the new Tour20 data set and several other multi-view data sets, we show that the proposed approach clearly outperforms the state-of-the-art methods on the two problems—topic-oriented video summarization and multi-view video summarization in a camera network.


international conference on multimedia retrieval | 2018

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval

Niluthpol Chowdhury Mithun; Juncheng Li; Florian Metze; Amit K. Roy-Chowdhury

Constructing a joint representation invariant across different modalities (e.g., video, language) is of significant importance in many multimedia applications. While there are a number of recent successes in developing effective image-text retrieval methods by learning joint representations, the video-text retrieval task, however, has not been explored to its fullest extent. In this paper, we study how to effectively utilize available multimodal cues from videos for the cross-modal video-text retrieval task. Based on our analysis, we propose a novel framework that simultaneously utilizes multi-modal features (different visual characteristics, audio inputs, and text) by a fusion strategy for efficient retrieval. Furthermore, we explore several loss functions in training the embedding and propose a modified pairwise ranking loss for the task. Experiments on MSVD and MSR-VTT datasets demonstrate that our method achieves significant performance gain compared to the state-of-the-art approaches.


information processing in sensor networks | 2018

ODDS: real-time object detection using depth sensors on embedded GPUs

Niluthpol Chowdhury Mithun; Sirajum Munir; Karen Guo; Charles Shelton

Detecting objects that are carried when someone enters or exits a room is very useful for a wide range of smart building applications including safety, security, and energy efficiency. While there has been a significant amount of work on object recognition using large-scale RGB image datasets, RGB cameras are too privacy invasive in many smart building applications and they work poorly in the dark. Additionally, deep object detection networks require powerful and expensive GPUs. We propose a novel system that we call ODDS (Object Detector using a Depth Sensor) that can detect objects in real-time using only raw depth data on an embedded GPU, e.g., NVIDIA Jetson TX1. Hence, our solution is significantly less privacy invasive (even if the sensor is compromised) and less expensive, while maintaining a comparable accuracy with state of the art solutions. Specifically, we resort to training a deep convolutional neural network using raw depth images, with curriculum based learning to improve accuracy by considering the complexity and imbalance in object classes and developing a sparse coding based technique that speeds up the system ~2x with minimal loss of accuracy. Based on a complete implementation and real-world evaluation, we see ODDS achieve 80.14% mean average precision in object detection in real-time (5-6 FPS) on a Jetson TX1.


acm multimedia | 2018

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Niluthpol Chowdhury Mithun; Rameswar Panda; Evangelos E. Papalexakis; Amit K. Roy-Chowdhury

Cross-modal retrieval between visual data and natural language description remains a long-standing challenge in multimedia. While recent image-text retrieval methods offer great promise by learning deep representations aligned across modalities, most of these methods are plagued by the issue of training with small-scale datasets covering a limited number of images with ground-truth sentences. Moreover, it is extremely expensive to create a larger dataset by annotating millions of images with sentences and may lead to a biased model. Inspired by the recent success of webly supervised learning in deep neural networks, we capitalize on readily-available web images with noisy annotations to learn robust image-text joint representation. Specifically, our main idea is to leverage web images and corresponding tags, along with fully annotated datasets, in training for learning the visual-semantic joint embedding. We propose a two-stage approach for the task that can augment a typical supervised pair-wise ranking loss based formulation with weakly-annotated web images to learn a more robust visual-semantic embedding. Experiments on two standard benchmark datasets demonstrate that our method achieves a significant performance gain in image-text retrieval compared to state-of-the-art approaches.


Expert Systems With Applications | 2016

Video-based tracking of vehicles using multipleźtime-spatialźimages

Niluthpol Chowdhury Mithun; Tamanna Howlader; S. M. Mahbubur Rahman


workshop on applications of computer vision | 2018

Learning Long-Term Invariant Features for Vision-Based Localization

Niluthpol Chowdhury Mithun; Cody Simons; Robert Casey; Stefan Hilligardt; Amit K. Roy-Chowdhury


international conference on multimedia and expo | 2018

Deep Learning Based Identity Verification in Renaissance Portraits

Akash Gupta; Niluthpol Chowdhury Mithun; Conrad Rudolph; Amit K. Roy-Chowdhury

Collaboration


Dive into the Niluthpol Chowdhury Mithun's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rameswar Panda

University of California

View shared research outputs
Top Co-Authors

Avatar

S. M. Mahbubur Rahman

Bangladesh University of Engineering and Technology

View shared research outputs
Top Co-Authors

Avatar

Nafi Ur Rashid

Bangladesh University of Engineering and Technology

View shared research outputs
Top Co-Authors

Avatar

Akash Gupta

University of California

View shared research outputs
Top Co-Authors

Avatar

Conrad Rudolph

University of California

View shared research outputs
Top Co-Authors

Avatar

Dorian Perkins

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Florian Metze

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge