Is this you? Create Your Porfile

Atousa Torabi

École Polytechnique de Montréal

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Atousa Torabi is active.

Explore More

Publication

Featured researches published by Atousa Torabi.

international conference on computer vision | 2015

Describing Videos by Exploiting Temporal Structure

Li Yao; Atousa Torabi; Kyunghyun Cho; Nicolas Ballas; Chris Pal; Hugo Larochelle; Aaron C. Courville

Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description model. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions.

international conference on multimodal interfaces | 2013

Combining modality specific deep neural networks for emotion recognition in video

Samira Ebrahimi Kahou; Chris Pal; Xavier Bouthillier; Pierre Froumenty; Caglar Gulcehre; Roland Memisevic; Pascal Vincent; Aaron C. Courville; Yoshua Bengio; Raul Chandias Ferrari; Mehdi Mirza; Sébastien Jean; Pierre-Luc Carrier; Yann N. Dauphin; Nicolas Boulanger-Lewandowski; Abhishek Aggarwal; Jeremie Zumer; Pascal Lamblin; Jean-Philippe Raymond; Guillaume Desjardins; Razvan Pascanu; David Warde-Farley; Atousa Torabi; Arjun Sharma; Emmanuel Bengio; Myriam Côté; Kishore Reddy Konda; Zhenzhou Wu

In this paper we present the techniques used for the University of Montréals team submissions to the 2013 Emotion Recognition in the Wild Challenge. The challenge is to classify the emotions expressed by the primary human subject in short video clips extracted from feature length movies. This involves the analysis of video clips of acted scenes lasting approximately one-two seconds, including the audio track which may contain human voices as well as background music. Our approach combines multiple deep neural networks for different data modalities, including: (1) a deep convolutional neural network for the analysis of facial expressions within video frames; (2) a deep belief net to capture audio information; (3) a deep autoencoder to model the spatio-temporal information produced by the human actions depicted within the entire scene; and (4) a shallow network architecture focused on extracted features of the mouth of the primary human subject in the scene. We discuss each of these techniques, their performance characteristics and different strategies to aggregate their predictions. Our best single model was a convolutional neural network trained to predict emotions from static frames using two large data sets, the Toronto Face Database and our own set of faces images harvested from Google image search, followed by a per frame aggregation strategy that used the challenge training data. This yielded a test set accuracy of 35.58%. Using our best strategy for aggregating our top performing models into a single predictor we were able to produce an accuracy of 41.03% on the challenge test set. These compare favorably to the challenge baseline test set accuracy of 27.56%.

Computer Vision and Image Understanding | 2012

An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications

Atousa Torabi; Guillaume Massé; Guillaume-Alexandre Bilodeau

In this work, we propose a new integrated framework that addresses the problems of thermal-visible video registration, sensor fusion, and people tracking for far-range videos. The video registration is based on a RANSAC trajectory-to-trajectory matching, which estimates an affine transformation matrix that maximizes the overlapping of thermal and visible foreground pixels. Sensor fusion uses the aligned images to compute sum-rule silhouettes, and then constructs thermal-visible object models. Finally, multiple object tracking uses blobs constructed in sensor fusion to output the trajectories. Results demonstrate the advantage of our proposed framework in obtaining better results for both image registration and tracking than separate image registration and tracking methods.

Pattern Recognition | 2013

Local self-similarity-based registration of human ROIs in pairs of stereo thermal-visible videos

Atousa Torabi; Guillaume-Alexandre Bilodeau

For several years, mutual information (MI) has been the classic multimodal similarity measure. The robustness of MI is closely restricted by the choice of MI window sizes. For unsupervised human monitoring applications, obtaining appropriate MI window sizes for computing MI in videos with multiple people in different sizes and different levels of occlusion is problematic. In this work, we apply local self-similarity (LSS) as a dense multimodal similarity metric and show its adequacy and strengths compared to MI for a human ROIs registration. We also propose an LSS-based registration of thermal-visible stereo videos that addresses the problem of multiple people and occlusions in the scene. Our method improves the accuracy of the state-of-the-art disparity voting (DV) correspondence algorithm by proposing a motion segmentation step that approximates depth segments in an image and enables assigning disparity to each depth segment using larger matching window while keeping registration accuracy. We demonstrate that our registration method outperforms the recent state-of-the-art MI-based stereo registration for several realistic close-range indoor thermal-visible stereo videos of multiple people. Highlights? We propose a Local self-similarity (LSS)-based multimodal correspondence measure. ? We study Comparatively MI and LSS for thermal-visible stereo matching. ? We propose an LSS-based registration method for human monitoring. ? LSS-based registration is more accurate compared to MI-based registration.

Image and Vision Computing | 2011

Visible and infrared image registration using trajectories and composite foreground images

Guillaume-Alexandre Bilodeau; Atousa Torabi; François Morin

The registration of images from multiple types of sensors (particularly infrared sensors and visible color sensors) is a step toward achieving multi-sensor fusion. This paper proposes a registration method using a novel error function. Registration of infrared and visible color images is performed by using the trajectories of moving objects obtained using background subtraction and simple tracking. The trajectory points are matched using a RANSAC-based algorithm and a novel registration criterion, which is based on the overlap of foreground pixels in composite foreground images. This criterion allows performing registration when there are few trajectories and gives more stable results. Our method was tested and its performance quantified using nine scenarios. It outperforms a related method only based on trajectory points in cases where there are few moving objects.

canadian conference on computer and robot vision | 2009

A Multiple Hypothesis Tracking Method with Fragmentation Handling

Atousa Torabi; Guillaume-Alexandre Bilodeau

In this paper, we present a new multiple hypothesestracking (MHT) approach. Our tracking method is suitablefor online applications, because it labels objects at everyframe and estimates the best computed trajectories up tothe current frame. In this work we address the problems ofobject merging and splitting (occlusions) and object fragmentations.Object fragmentation resulting from imperfectbackground subtraction can easily be confused with splittingobjects in a scene, especially in close range surveillanceapplications. This subject is not addressed in mostMHT methods. In this work, we propose a framework forMHT which distinguishes fragmentation and splitting usingtheir spatial and temporal characteristics and by generatinghypotheses only for splitting cases using observation inlater frames. This approach results in a more accurate dataassociation and a reduced size of the hypothesis graph. Ourtracking method is evaluated with various indoor videos.

ieee international symposium on robotic and sensors environments | 2011

A comparative evaluation of multimodal dense stereo correspondence measures

Atousa Torabi; Mahya Najafianrazavi; Guillaume-Alexandre Bilodeau

In this paper, we compare the behavior of four viable dense stereo correspondence measures, which are Normalized Cross-Correlation (NCC), Histograms of Oriented Gradients (HOG), Mutual Information (MI), and Local Self-Similarity (LSS), for thermal-visible human monitoring. Our comparison is based on a Winner Take All (WTA) box matching stereo method. We evaluate the accuracy and the discriminative power of each correspondence measure using challenging thermal-visible pairs of video frames of different people with different poses, clothing, and distances to cameras for close-range human monitoring applications.

computer vision and pattern recognition | 2010

Feedback scheme for thermal-visible video registration, sensor fusion, and people tracking

Atousa Torabi; Guillaume Massé; Guillaume-Alexandre Bilodeau

In this work, we propose a feedback scheme for simultaneous thermal-visible video registration, sensor fusion, and tracking for online video surveillance applications. The video registration is based on a RANSAC trajectory-to-trajectory matching that estimates an affine transformation matrix that maximizes the corresponding trajectory points and overlapping of foreground thermal and visible pixels. Sensor fusion uses the aligned images to compute sum-rule blobs for thermal and visible images and constructs the thermal-visible blobs. Finally, the multiple object tracking gets blobs constructed in sensor fusion as the input and outputs the trajectories of moving humans in the scene. We tested our method on long-term indoor and outdoor video sequences and demonstrate the effectiveness of our feedback design in obtaining better quality for both image registration and tracking.

international symposium on visual computing | 2008

Measuring an Animal Body Temperature in Thermographic Video Using Particle Filter Tracking

Atousa Torabi; Guillaume-Alexandre Bilodeau; Maxime Lévesque; J.M.P. Langlois; Pablo Lema; Lionel Carmant

Some studies on epilepsy have shown that seizures might change the body temperature of a patient. Furthermore, other works have shown that kainic acid, a drug used to study seizures, modify body temperature of a laboratory rat. Thus, thermographic cameras may have an important role in investigating seizures. In this paper, we present the methods we have developed to measure the temperature of a moving rat subject to seizure using a thermographic camera and image processing. To accurately measure the body temperature, a particle filter tracker has been developed and tested along with an experimental methodology. The obtained measures are compared with a ground truth. The methods are tested on a 2-hour video and it is shown that our method achieves the promising results.

machine vision applications | 2012

Body temperature estimation of a moving subject from thermographic images

Guillaume-Alexandre Bilodeau; Atousa Torabi; Maxime Lévesque; Charles Ouellet; J. M. Pierre Langlois; Pablo Lema; Lionel Carmant

The continual measurement of the body temperature of a moving subject in a non-invasive way is a challenging task. However, doing so enables the observation of important phenomena with not much inconvenience to the subject, and can be a powerful tool for understanding physiological reactions to diseases and medications. In this paper, we present a method to obtain the body temperature on a moving subject from thermographic images. The camera’s output (a measurement for each pixel) is processed with a particle filter tracker, a clustering algorithm, and a Kalman filter to reduce tracking and measurement noise. The method was tested on videos from animal experiments and on a human patient. Tracking performance was then evaluated by comparison with manually selected regions of interest in thermographic images. The method achieves RMS temperature estimation errors of <0.1°C.

Explore More