Alexandre Alahi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexandre Alahi is active.

Explore More

Publication

Featured researches published by Alexandre Alahi.

computer vision and pattern recognition | 2012

FREAK: Fast Retina Keypoint

Alexandre Alahi; Raphael Ortiz; Pierre Vandergheynst

A large number of vision applications rely on matching keypoints across images. The last decade featured an arms-race towards faster and more robust keypoints and association algorithms: Scale Invariant Feature Transform (SIFT)[17], Speed-up Robust Feature (SURF)[4], and more recently Binary Robust Invariant Scalable Keypoints (BRISK)[I6] to name a few. These days, the deployment of vision algorithms on smart phones and embedded devices with low memory and computation complexity has even upped the ante: the goal is to make descriptors faster to compute, more compact while remaining robust to scale, rotation and noise. To best address the current requirements, we propose a novel keypoint descriptor inspired by the human visual system and more precisely the retina, coined Fast Retina Keypoint (FREAK). A cascade of binary strings is computed by efficiently comparing image intensities over a retinal sampling pattern. Our experiments show that FREAKs are in general faster to compute with lower memory load and also more robust than SIFT, SURF or BRISK. They are thus competitive alternatives to existing keypoints in particular for embedded applications.

european conference on computer vision | 2016

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Justin Johnson; Alexandre Alahi; Li Fei-Fei

We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

computer vision and pattern recognition | 2016

Social LSTM: Human Trajectory Prediction in Crowded Spaces

Alexandre Alahi; Kratarth Goel; Vignesh Ramanathan; Alexandre Robicquet; Li Fei-Fei; Silvio Savarese

Pedestrians follow different trajectories to avoid obstacles and accommodate fellow pedestrians. Any autonomous vehicle navigating such a scene should be able to foresee the future positions of pedestrians and accordingly adjust its path to avoid collisions. This problem of trajectory prediction can be viewed as a sequence generation task, where we are interested in predicting the future trajectory of people based on their past positions. Following the recent success of Recurrent Neural Network (RNN) models for sequence prediction tasks, we propose an LSTM model which can learn general human movement and predict their future trajectories. This is in contrast to traditional approaches which use hand-crafted functions such as Social forces. We demonstrate the performance of our method on several public datasets. Our model outperforms state-of-the-art methods on some of these datasets. We also analyze the trajectories predicted by our model to demonstrate the motion behaviour learned by our model.

international conference on computer vision | 2015

Learning to Track: Online Multi-object Tracking by Decision Making

Yu Xiang; Alexandre Alahi; Silvio Savarese

Online Multi-Object Tracking (MOT) has wide applications in time-critical video analysis scenarios, such as robot navigation and autonomous driving. In tracking-by-detection, a major challenge of online MOT is how to robustly associate noisy object detections on a new video frame with previously tracked objects. In this work, we formulate the online MOT problem as decision making in Markov Decision Processes (MDPs), where the lifetime of an object is modeled with a MDP. Learning a similarity function for data association is equivalent to learning a policy for the MDP, and the policy learning is approached in a reinforcement learning fashion which benefits from both advantages of offline-learning and online-learning for data association. Moreover, our framework can naturally handle the birth/death and appearance/disappearance of targets by treating them as state transitions in the MDP while leveraging existing online single object tracking methods. We conduct experiments on the MOT Benchmark [24] to verify the effectiveness of our method.

Journal of Mathematical Imaging and Vision | 2011

Sparsity Driven People Localization with a Heterogeneous Network of Cameras

Alexandre Alahi; Laurent Jacques; Yannick Boursier; Pierre Vandergheynst

This paper addresses the problem of localizing people in low and high density crowds with a network of heterogeneous cameras. The problem is recast as a linear inverse problem. It relies on deducing the discretized occupancy vector of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera. This inverse problem is regularized by imposing a sparse occupancy vector, i.e., made of few non-zero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. The proposed framework is (i) generic to any scene of people, i.e., people are located in low and high density crowds, (ii) scalable to any number of cameras and already working with a single camera, (iii) unconstrained by the scene surface to be monitored, and (iv) versatile with respect to the camera’s geometry, e.g., planar or omnidirectional.Qualitative and quantitative results are presented on the APIDIS and the PETS 2009 Benchmark datasets. The proposed algorithm successfully detects people occluding each other given severely degraded extracted features, while outperforming state-of-the-art people localization techniques.

international conference on image processing | 2010

Stream carving: An adaptive seam carving algorithm

Daniel Domingues; Alexandre Alahi; Pierre Vandergheynst

We propose a new content-aware image resizing scheme, Stream Carving, which is based on the well-known seam carving method. Our algorithm may introduce larger seams in the retargeted image, i.e. seams with a width larger than one pixel, that we call “streams”. The resulting holes are then recovered using an inpainting method. Our retargeting algorithm is also more related to human perception by exploiting an adaptive importance map that merges several features like gradient magnitude, saliency, face, edge and straight line detection. Our approach induces an increase in the quality of the retargeted image when compared to the original seam carving method and provides similar or better results than other actual image retargeting techniques.

computer vision and pattern recognition | 2016

Recurrent Attention Models for Depth-Based Person Identification

Albert Haque; Alexandre Alahi; Li Fei-Fei

We present an attention-based model that reasons on human body shape and motion dynamics to identify individuals in the absence of RGB information, hence in the dark. Our approach leverages unique 4D spatio-temporal signatures to address the identification problem across days. Formulated as a reinforcement learning task, our model is based on a combination of convolutional and recurrent neural networks with the goal of identifying small, discriminative regions indicative of human identity. We demonstrate that our model produces state-of-the-art results on several published datasets given only depth images. We further study the robustness of our model towards viewpoint, appearance, and volumetric changes. Finally, we share insights gleaned from interpretable 2D, 3D, and 4D visualizations of our models spatio-temporal attention.

european conference on computer vision | 2016

Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes

Alexandre Robicquet; Amir Sadeghian; Alexandre Alahi; Silvio Savarese

Humans navigate crowded spaces such as a university campus by following common sense rules based on social etiquette. In this paper, we argue that in order to enable the design of new target tracking or trajectory forecasting methods that can take full advantage of these rules, we need to have access to better data in the first place. To that end, we contribute a new large-scale dataset that collects videos of various types of targets (not just pedestrians, but also bikers, skateboarders, cars, buses, golf carts) that navigate in a real world outdoor environment such as a university campus. Moreover, we introduce a new characterization that describes the “social sensitivity” at which two targets interact. We use this characterization to define “navigation styles” and improve both forecasting models and state-of-the-art multi-target tracking–whereby the learnt forecasting models help the data association step.

computer vision and pattern recognition | 2017

Unsupervised Learning of Long-Term Motion Dynamics for Videos

Zelun Luo; Boya Peng; De-An Huang; Alexandre Alahi; Li Fei-Fei

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, depth, and RGB-D videos.

2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance | 2009

Sparsity-driven people localization algorithm: Evaluation in crowded scenes environments

Alexandre Alahi; Laurent Jacques; Yannick Boursier; Pierre Vandergheynst

We propose to evaluate our sparsity driven people localization framework on crowded complex scenes. The problem is recast as a linear inverse problem. It relies on deducing an occupancy vector, i.e. the discretized occupancy of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera. This inverse problem is regularized by imposing a sparse occupancy vector, i.e. made of few nonzero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. The proposed approach is (i) generic to any scene of people, i.e. people are located in low and high density crowds, (ii) scalable to any number of cameras and already working with a single camera, (iii) unconstraint on the scene surface to be monitored. Qualitative and quantitative results are presented given the PETS 2009 dataset. The proposed algorithm detects people in high density crowd, count and track them given severely degraded foreground silhouettes.

Explore More