Hamid Izadinia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hamid Izadinia is active.

Explore More

Publication

Featured researches published by Hamid Izadinia.

european conference on computer vision | 2012

Recognizing complex events using large margin joint low-level event model

Hamid Izadinia; Mubarak Shah

In this paper we address the challenging problem of complex event recognition by using low-level events. In this problem, each complex event is captured by a long video in which several low-level events happen. The dataset contains several videos and due to the large number of videos and complexity of the events, the available annotation for the low-level events is very noisy which makes the detection task even more challenging. To tackle these problems we model the joint relationship between the low-level events in a graph where we consider a node for each low-level event and whenever there is a correlation between two low-level events the graph has an edge between the corresponding nodes. In addition, for decreasing the effect of weak and/or irrelevant low-level event detectors we consider the presence/absence of low-level events as hidden variables and learn a discriminative model by using latent SVM formulation. Using our learned model for the complex event recognition, we can also apply it for improving the detection of the low-level events in video clips which enables us to discover a conceptual description of the video. Thus our model can do complex event recognition and explain a video in terms of low-level events in a single framework. We have evaluated our proposed method over the most challenging multimedia event detection dataset. The experimental results reveals that the proposed method performs well compared to the baseline method. Further, our results of conceptual description of video shows that our model is learned quite well to handle the noisy annotation and surpass the low-level event detectors which are directly trained on the raw features.

european conference on computer vision | 2012

(MP) 2 T: multiple people multiple parts tracker

Hamid Izadinia; Imran Saleemi; Wenhui Li; Mubarak Shah

We present a method for multi-target tracking that exploits the persistence in detection of object parts. While the implicit representation and detection of body parts have recently been leveraged for improved human detection, ours is the first method that attempts to temporally constrain the location of human body parts with the express purpose of improving pedestrian tracking. We pose the problem of simultaneous tracking of multiple targets and their parts in a network flow optimization framework and show that parts of this network need to be optimized separately and iteratively, due to inter-dependencies of node and edge costs. Given potential detections of humans and their parts separately, an initial set of pedestrian tracklets is first obtained, followed by explicit tracking of human parts as constrained by initial human tracking. A merging step is then performed whereby we attempt to include part-only detections for which the entire human is not observable. This step employs a selective appearance model, which allows us to skip occluded parts in description of positive training samples. The result is high confidence, robust trajectories of pedestrians as well as their parts, which essentially constrain each others locations and associations, thus improving human tracking and parts detection. We test our algorithm on multiple real datasets and show that the proposed algorithm is an improvement over the state-of-the-art.

IEEE Transactions on Multimedia | 2013

Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects

Hamid Izadinia; Imran Saleemi; Mubarak Shah

In this paper, we propose a novel method that exploits correlation between audio-visual dynamics of a video to segment and localize objects that are the dominant source of audio. Our approach consists of a two-step spatiotemporal segmentation mechanism that relies on velocity and acceleration of moving objects as visual features. Each frame of the video is segmented into regions based on motion and appearance cues using the QuickShift algorithm, which are then clustered over time using K-means, so as to obtain a spatiotemporal video segmentation. The video is represented by motion features computed over individual segments. The Mel-Frequency Cepstral Coefficients (MFCC) of the audio signal, and their first order derivatives are exploited to represent audio. The proposed framework assumes there is a non-trivial correlation between these audio features and the velocity and acceleration of the moving and sounding objects. The canonical correlation analysis (CCA) is utilized to identify the moving objects which are most correlated to the audio signal. In addition to moving-sounding object identification, the same framework is also exploited to solve the problem of audio-video synchronization, and is used to aid interactive segmentation. We evaluate the performance of our proposed method on challenging videos. Our experiments demonstrate significant increase in performance over the state-of-the-art both qualitatively and quantitatively, and validate the feasibility and superiority of our approach.

computer vision and pattern recognition | 2014

Incorporating Scene Context and Object Layout into Appearance Modeling

Hamid Izadinia; Fereshteh Sadeghi; Ali Farhadi

A scene category imposes tight distributions over the kind of objects that might appear in the scene, the appearance of those objects and their layout. In this paper, we propose a method to learn scene structures that can encode three main interlacing components of a scene: the scene category, the context-specific appearance of objects, and their layout. Our experimental evaluations show that our learned scene structures outperform state-of-the-art method of Deformable Part Models in detecting objects in a scene. Our scene structure provides a level of scene understanding that is amenable to deep visual inferences. The scene structures can also generate features that can later be used for scene categorization. Using these features, we also show promising results on scene categorization.

acm multimedia | 2015

Deep Classifiers from Image Tags in the Wild

Hamid Izadinia; Bryan C. Russell; Ali Farhadi; Matthew D. Hoffman; Aaron Hertzmann

This paper proposes direct learning of image classification from image tags in the wild, without filtering. Each wild tag is supplied by the user who shared the image online. Enormous numbers of these tags are freely available, and they give insight about the image categories important to users and to image classification. Our main contribution is an analysis of the Flickr 100 Million Image dataset, including several useful observations about the statistics of these tags. We introduce a large-scale robust classification algorithm, in order to handle the inherent noise in these tags, and a calibration procedure to better predict objective annotations. We show that freely available, wild tag can obtain similar or superior results to large databases of costly manual annotations.

international conference on computer vision | 2015

Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing

Hamid Izadinia; Fereshteh Sadeghi; Santosh Kumar Divvala; Hannaneh Hajishirzi; Yejin Choi; Ali Farhadi

We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations. Leveraging recent progress in object recognition and natural language semantics, we show how we can successfully build a high-quality segment-phrase table using minimal human supervision. More importantly, we demonstrate the unique value unleashed by this rich bimodal resource, for both vision as well as natural language understanding. First, we show that fine-grained textual labels facilitate contextual reasoning that helps in satisfying semantic constraints across image segments. This feature enables us to achieve state-of-the-art segmentation results on benchmark datasets. Next, we show that the association of high-quality segmentations to textual phrases aids in richer semantic understanding and reasoning of these textual phrases. Leveraging this feature, we motivate the problem of visual entailment and visual paraphrasing, and demonstrate its utility on a large dataset.

ieee international conference on fuzzy systems | 2009

Fuzzy generalized hough transform invariant to rotation and scale in noisy environment

Hamid Izadinia; Fereshteh Sadeghi; Mohammad Mehdi Ebadzadeh

Generalized Hough Transform (GHT) is an efficient method for detecting curves by exploiting the duality between points on a curve and parameters of that curve. However GHT has some practical limitations such as high computational cost and huge memory requirement for detecting scaled and rotated objects. In this paper a new method, namely Fuzzy Generalized Hough Transform (FGHT), is proposed that alleviates these deficiencies by utilizing the concept of fuzzy inference system. In FGHT the R-table consists of a set of fuzzy rules which are fired by the gradient direction of edge pixels and vote for the possible location of the center. Moreover, the proposed method can identify the boundary of the rotated and scaled object via a new voting strategy. To evaluate the effectiveness of FGHT several experiments with scaled, rotated, occluded and noisy images are conducted. The results are compared with two extensions of GHT and have revealed that the proposed method can locate and detect the prototype object with least error under various conditions.

acm multimedia | 2017

Tag Prediction at Flickr: A View from the Darkroom

Kofi Boakye; Sachin Sudhakar Farfade; Hamid Izadinia; Yannis Kalantidis; Pierre Garrigues

Automated photo tagging has established itself as one of the most compelling applications of deep learning. While deep convolutional neural networks have repeatedly demonstrated top performance on standard datasets for classification, there are a number of often overlooked but important considerations when deploying this technology in a real-world scenario. In this paper, we present our efforts in developing a large-scale photo tagging system for Flickr photo search. We discuss topics including how to 1) select the tags that matter most to our users; 2) develop lightweight, high-performance models for tag prediction; and 3) leverage the power of large amounts of noisy data for training. Our results demonstrate that, for real-world datasets, training exclusively with this noisy data yields performance on par with the standard paradigm of first pre-training on clean data and then fine-tuning. In addition, we observe that the models trained with user-generated data can yield better fine-tuning results when a small amount of clean data is available. As such, we advocate for the approach of harnessing user-generated data in large-scale systems.

workshop on applications of computer vision | 2013

Multi-pose multi-target tracking for activity understanding

Hamid Izadinia; Varun Ramakrishna; Kris M. Kitani; Daniel Huber

We evaluate the performance of a widely used tracking-by-detection and data association multi-target tracking pipeline applied to an activity-rich video dataset. In contrast to traditional work on multi-target pedestrian tracking where people are largely assumed to be upright, we use an activity-rich dataset that includes a wide range of body poses derived from actions such as picking up an object, riding a bike, digging with a shovel, and sitting down. For each step of the tracking pipeline, we identify key limitations and offer practical modifications that enable robust multi-target tracking over a range of activities. We show that the use of multiple posture-specific detectors and an appearance-based data association post-processing step can generate non-fragmented trajectories essential for holistic activity understanding.

international symposium on neural networks | 2009

A Hybrid Fuzzy Neuro-Immune Network based on Multi-Epitope approach

Hamid Izadinia; Fereshteh Sadeghi; Mohammad Mehdi Ebadzadeh

The natural immune system is composed of cells and molecules with complex interactions. Jerne modeled the interactions among immune cells and molecules by introducing the immune network. The immune system provides an effective defense mechanism against foreign substances. This system like the neural system is able to learn from experience. In this paper, the Jernes immune network model is extended and a new classifier based on the new immune network model and Learning Vector Quantization (LVQ) is proposed. The new classification method is called Hybrid Fuzzy Neuro-Immune Network based on Multi-Epitope approach (HFNINME). The performance of the proposed method is evaluated via several benchmark classification problems and is compared with two other prominent immune-based classifiers. The experiments reveal that the proposed method yields a parsimonious classifier that can classify data more accurately and more efficiently.

Explore More