Afshin Dehghan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Afshin Dehghan is active.

Explore More

Publication

Featured researches published by Afshin Dehghan.

european conference on computer vision | 2012

GMCP-Tracker: global multi-object tracking using generalized minimum clique graphs

Amir Roshan Zamir; Afshin Dehghan; Mubarak Shah

Data association is an essential component of any human tracking system. The majority of current methods, such as bipartite matching, incorporate a limited-temporal-locality of the sequence into the data association problem, which makes them inherently prone to IDswitches and difficulties caused by long-term occlusion, cluttered background, and crowded scenes.We propose an approach to data association which incorporates both motion and appearance in a global manner. Unlike limited-temporal-locality methods which incorporate a few frames into the data association problem, we incorporate the whole temporal span and solve the data association problem for one object at a time, while implicitly incorporating the rest of the objects. In order to achieve this, we utilize Generalized Minimum Clique Graphs to solve the optimization problem of our data association method. Our proposed method yields a better formulated approach to data association which is supported by our superior results. Experiments show the proposed method makes significant improvements in tracking in the diverse sequences of Town Center [1], TUD-crossing [2], TUD-Stadtmitte [2], PETS2009 [3], and a new sequence called Parking Lot compared to the state of the art methods.

computer vision and pattern recognition | 2012

Part-based multiple-person tracking with partial occlusion handling

Guang Shu; Afshin Dehghan; Omar Oreifej; Emily Hand; Mubarak Shah

Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. In this paper, we address such problems by proposing a robust part-based tracking-by-detection framework. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Our approach learns part-based person-specific SVM classifiers which capture the articulations of the human bodies in dynamically changing appearance and background. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection, which significantly improves the detection performance in crowded scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions, and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking.

computer vision and pattern recognition | 2015

GMMCP tracker: Globally optimal Generalized Maximum Multi Clique problem for multiple object tracking

Afshin Dehghan; Shayan Modiri Assari; Mubarak Shah

Data association is the backbone to many multiple object tracking (MOT) methods. In this paper we formulate data association as a Generalized Maximum Multi Clique problem (GMMCP). We show that this is the ideal case of modeling tracking in real world scenario where all the pairwise relationships between targets in a batch of frames are taken into account. Previous works assume simplified version of our tracker either in problem formulation or problem optimization. However, we propose a solution using GMMCP where no simplification is assumed in either steps. We show that the NP hard problem of GMMCP can be formulated through Binary-Integer Program where for small and medium size MOT problems the solution can be found efficiently. We further propose a speed-up method, employing Aggregated Dummy Nodes for modeling occlusion and miss-detection, which reduces the size of the input graph without using any heuristics. We show that, using the speedup method, our tracker lends itself to real-time implementation which is plausible in many applications. We evaluated our tracker on six challenging sequences of Town Center, TUD-Crossing, TUD-Stadtmitte, Parking-lot 1, Parking-lot 2 and Parking-lot pizza and show favorable improvement against state of art.

computer vision and pattern recognition | 2013

Improving an Object Detector and Extracting Regions Using Superpixels

Guang Shu; Afshin Dehghan; Mubarak Shah

We propose an approach to improve the detection performance of a generic detector when it is applied to a particular video. The performance of offline-trained objects detectors are usually degraded in unconstrained video environments due to variant illuminations, backgrounds and camera viewpoints. Moreover, most object detectors are trained using Haar-like features or gradient features but ignore video specific features like consistent color patterns. In our approach, we apply a Super pixel-based Bag-of-Words (BoW) model to iteratively refine the output of a generic detector. Compared to other related work, our method builds a video-specific detector using super pixels, hence it can handle the problem of appearance variation. Most importantly, using Conditional Random Field (CRF) along with our super pixel-based BoW model, we develop and algorithm to segment the object from the background. Therefore our method generates an output of the exact object regions instead of the bounding boxes generated by most detectors. In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions. The experiments on four recent datasets demonstrate the effectiveness of our approach and significantly improves the state-of-art detector by 5-16% in average precision.

computer vision and pattern recognition | 2015

Target Identity-aware Network Flow for online multiple target tracking

Afshin Dehghan; Yicong Tian; Philip H. S. Torr; Mubarak Shah

In this paper we show that multiple object tracking (MOT) can be formulated in a framework, where the detection and data-association are performed simultaneously. Our method allows us to overcome the confinements of data association based MOT approaches; where the performance is dependent on the object detection results provided at input level. At the core of our method lies structured learning which learns a model for each target and infers the best location of all targets simultaneously in a video clip. The inference of our structured learning is done through a new Target Identity-aware Network Flow (TINF), where each node in the network encodes the probability of each target identity belonging to that node. The proposed Lagrangian relaxation optimization finds the high quality solution to the network. During optimization a soft spatial constraint is enforced between the nodes of the graph which helps reducing the ambiguity caused by nearby targets with similar appearance in crowded scenarios. We show that automatically detecting and tracking targets in a single framework can help resolve the ambiguities due to frequent occlusion and heavy articulation of targets. Our experiments involve challenging yet distinct datasets and show that our method can achieve results better than the state-of-art.

Archive | 2014

Automatic Detection and Tracking of Pedestrians in Videos with Various Crowd Densities

Afshin Dehghan; Haroon Idrees; Amir Roshan Zamir; Mubarak Shah

Manual analysis of pedestrians and crowds is often impractical for massive datasets of surveillance videos. Automatic tracking of humans is one of the essential abilities for computerized analysis of such videos. In this keynote paper, we present two state of the art methods for automatic pedestrian tracking in videos with low and high crowd density. For videos with low density, first we detect each person using a part-based human detector. Then, we employ a global data association method based on Generalized Graphs for tracking each individual in the whole video. In videos with high crowd-density, we track individuals using a scene structured force model and crowd flow modeling. Additionally, we present an alternative approach which utilizes contextual information without the need to learn the structure of the scene. Performed evaluations show the presented methods outperform the currently available algorithms on several benchmarks.

acm multimedia | 2013

Visual business recognition: a multimodal approach

Amir Roshan Zamir; Afshin Dehghan; Mubarak Shah

In this paper we investigate a new problem called visual business recognition. Automatic identification of businesses in images is an interesting task with plenty of potential applications especially for mobile device users. We propose a multimodal approach which incorporates business directories, textual information, and web images in a unified framework. We assume the query image is associated with a coarse location tag and utilize business directories for extracting an over complete list of nearby businesses which may be visible in the image. We use the name of nearby businesses as search keywords in order to automatically collect a set of relevant images from the web and perform image matching between them and the query. Additionally, we employ a text processing method customized for business recognition which is assisted by nearby business names; we fuse the information acquired from image matching and text processing in a probabilistic framework to recognize the businesses. We tested the proposed algorithm on a challenging set of user-uploaded and street view images with promising results for this new application.

computer vision and pattern recognition | 2014

Improving Semantic Concept Detection through the Dictionary of Visually-Distinct Elements

Afshin Dehghan; Haroon Idrees; Mubarak Shah

A video captures a sequence and interactions of concepts that can be static, for instance, objects or scenes, or dynamic, such as actions. For large datasets containing hundreds of thousands of images or videos, it is impractical to manually annotate all the concepts, or all the instances of a single concept. However, a dictionary with visually-distinct elements can be created automatically from unlabeled videos which can capture and express the entire dataset. The downside to this machine-discovered dictionary is meaninglessness, i.e., its elements are devoid of semantics and interpretation. In this paper, we present an approach that leverages the strengths of semantic concepts and the machine-discovered DOVE by learning a relationship between them. Since instances of a semantic concept share visual similarity, the proposed approach uses soft-consensus regularization to learn the mapping that enforces instances from each semantic concept to have similar representations. The testing is performed by projecting the query onto the DOVE as well as new representations of semantic concepts from training, with non-negativity and unit summation constraints for probabilistic interpretation. We tested our formulation on TRECVID MED and SIN tasks, and obtained encouraging results.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

Binary Quadratic Programing for Online Tracking of Hundreds of People in Extremely Crowded Scenes

Afshin Dehghan; Mubarak Shah

Multi-object tracking has been studied for decades. However, when it comes to tracking pedestrians in extremely crowded scenes, we are limited to only few works. This is an important problem which gives rise to several challenges. Pre-trained object detectors fail to localize targets in crowded sequences. This consequently limits the use of data-association based multi-target tracking methods which rely on the outcome of an object detector. Additionally, the small apparent target size makes it challenging to extract features to discriminate targets from their surroundings. Finally, the large number of targets greatly increases computational complexity which in turn makes it hard to extend existing multi-target tracking approaches to high-density crowd scenarios. In this paper, we propose a tracker that addresses the aforementioned problems and is capable of tracking hundreds of people efficiently. We formulate online crowd tracking as Binary Quadratic Programing. Our formulation employs targets individual information in the form of appearance and motion as well as contextual cues in the form of neighborhood motion, spatial proximity and grouping, and solves detection and data association simultaneously. In order to solve the proposed quadratic optimization efficiently, where state-of art commercial quadratic programing solvers fail to find the solution in a reasonable amount of time, we propose to use the most recent version of the Modified Frank Wolfe algorithm, which takes advantage of SWAP-steps to speed up the optimization. We show that the proposed formulation can track hundreds of targets efficiently and improves state-of-art results by significant margins on eleven challenging high density crowd sequences.

international conference on image processing | 2014

Complex event recognition by latent temporal models of concepts

Ehsan Zare Borzeshi; Afshin Dehghan; Massimo Piccardi; Mubarak Shah

Complex event recognition is an expanding research area aiming to recognize entities of high-level semantics in videos. Typical approaches exploit the so-called “bags” of spatiotemporal features such as STIP, ISA and DTF-HOG; yet, more recently, the notion of concept has emerged as an alternative, intermediate representation with greater descriptive power, and “bags of concepts” have been used for recognition. In this paper we argue that concepts in an event tend to articulate over a discernible temporal structure and we exploit a temporal model using the scores of concept detectors as measurements. In addition, we propose several heuristics to improve the initialization of the models latent states and take advantage of the time-sparsity of the concepts. Experimental results on videos from the challenging TRECVID MED 2012 dataset show that the proposed approach achieves an improvement in average precision of 8.92% over comparable bags of concepts, thus validating the use of temporal structure over concepts for complex event recognition.

Explore More