Basura Fernando | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Basura Fernando is active.

Explore More

Publication

Featured researches published by Basura Fernando.

international conference on computer vision | 2013

Unsupervised Visual Domain Adaptation Using Subspace Alignment

Basura Fernando; Amaury Habrard; Marc Sebban; Tinne Tuytelaars

In this paper, we introduce a new domain adaptation (DA) algorithm where the source and target domains are represented by subspaces described by eigenvectors. In this context, our method seeks a domain adaptation solution by learning a mapping function which aligns the source subspace with the target one. We show that the solution of the corresponding optimization problem can be obtained in a simple closed form, leading to an extremely fast algorithm. We use a theoretical result to tune the unique hyper parameter corresponding to the size of the subspaces. We run our method on various datasets and show that, despite its intrinsic simplicity, it outperforms state of the art DA methods.

computer vision and pattern recognition | 2015

Modeling video evolution for action recognition

Basura Fernando; Efstratios Gavves; M José Oramas; Amir Ghodrati; Tinne Tuytelaars

In this paper we present a method to capture video-wide temporal information for action recognition. We postulate that a function capable of ordering the frames of a video temporally (based on the appearance) captures well the evolution of the appearance within the video. We learn such ranking functions per video via a ranking machine and use the parameters of these as a new video representation. The proposed method is easy to interpret and implement, fast to compute and effective in recognizing a wide variety of actions. We perform a large number of evaluations on datasets for generic action recognition (Hollywood2 and HMDB51), fine-grained actions (MPII- cooking activities) and gestures (Chalearn). Results show that the proposed method brings an absolute improvement of 7-10%, while being compatible with and complementary to further improvements in appearance and local motion based methods.

international conference on computer vision | 2013

Fine-Grained Categorization by Alignments

Efstratios Gavves; Basura Fernando; Cees G. M. Snoek; Arnold W. M. Smeulders; Tinne Tuytelaars

The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to transfer part annotations from training images to test images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We furthermore argue that in the distinction of fine grained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing localized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.

computer vision and pattern recognition | 2016

Dynamic Image Networks for Action Recognition

Hakan Bilen; Basura Fernando; Efstratios Gavves; Andrea Vedaldi; Stephen Gould

We introduce the concept of dynamic image, a novel compact representation of videos useful for video analysis especially when convolutional neural networks (CNNs) are used. The dynamic image is based on the rank pooling concept and is obtained through the parameters of a ranking machine that encodes the temporal evolution of the frames of the video. Dynamic images are obtained by directly applying rank pooling on the raw image pixels of a video producing a single RGB image per video. This idea is simple but powerful as it enables the use of existing CNN models directly on video data with fine-tuning. We present an efficient and effective approximate rank pooling operator, speeding it up orders of magnitude compared to rank pooling. Our new approximate rank pooling CNN layer allows us to generalize dynamic images to dynamic feature maps and we demonstrate the power of our new representations on standard benchmarks in action recognition achieving state-of-the-art performance.

international conference on computer vision | 2015

Guiding the Long-Short Term Memory Model for Image Caption Generation

Xu Jia; Efstratios Gavves; Basura Fernando; Tinne Tuytelaars

In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search to avoid bias towards short sentences. On various benchmark datasets such as Flickr8K, Flickr30K and MS COCO, we obtain results that are on par with or better than the current state-of-the-art.

european conference on computer vision | 2016

SPICE: Semantic Propositional Image Caption Evaluation

Peter Anderson; Basura Fernando; Mark Johnson; Stephen Gould

There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and propose a new automated caption evaluation metric defined over scene graphs coined SPICE. Extensive evaluations across a range of models and datasets indicate that SPICE captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR). Furthermore, SPICE can answer questions such as which caption-generator best understands colors? and can caption-generators count?

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017

Rank Pooling for Action Recognition

Basura Fernando; Efstratios Gavves; M José Oramas; Amir Ghodrati; Tinne Tuytelaars

We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g., how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to the video data can serve as a robust new video representation. As a specific example, we learn a pooling function via ranking machines. By learning to rank the frame-level features of a video in chronological order, we obtain a new representation that captures the video-wide temporal dynamics of a video, suitable for action recognition. Other than ranking functions, we explore different parametric models that could also explain the temporal changes in videos. The proposed functional pooling methods, and rank pooling in particular, is easy to interpret and implement, fast to compute and effective in recognizing a wide variety of actions. We evaluate our method on various benchmarks for generic action, fine-grained action and gesture recognition. Results show that rank pooling brings an absolute improvement of 7-10 average pooling baseline. At the same time, rank pooling is compatible with and complementary to several appearance and local motion based methods and features, such as improved trajectories and deep learning features.

computer vision and pattern recognition | 2012

Discriminative feature fusion for image classification

Basura Fernando; Elisa Fromont; Damien Muselet; Marc Sebban

Bag-of-words-based image classification approaches mostly rely on low level local shape features. However, it has been shown that combining multiple cues such as color, texture, or shape is a challenging and promising task which can improve the classification accuracy. Most of the state-of-the-art feature fusion methods usually aim to weight the cues without considering their statistical dependence in the application at hand. In this paper, we present a new logistic regression-based fusion method, called LRFF, which takes advantage of the different cues without being tied to any of them. We also design a new marginalized kernel by making use of the output of the regression model. We show that such kernels, surprisingly ignored so far by the computer vision community, are particularly well suited to achieve image classification tasks. We compare our approach with existing methods that combine color and shape on three datasets. The proposed learning-based feature fusion process clearly outperforms the state-of-the art fusion methods for image classification.

International Journal of Computer Vision | 2014

Mining Mid-level Features for Image Classification

Basura Fernando; Elisa Fromont; Tinne Tuytelaars

Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In particular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC.

european conference on computer vision | 2012

Effective use of frequent itemset mining for image classification

Basura Fernando; Elisa Fromont; Tinne Tuytelaars

In this paper we propose a new and effective scheme for applying frequent itemset mining to image classification tasks. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During the construction of the FLHs, we pay special attention to keep all the local histogram information during the mining process and to select the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and some proposed extensions to exploit other visual cues such as colour or global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the art results on various image classification benchmarks.

Explore More