Joaquin Zepeda
Technicolor
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joaquin Zepeda.
european conference on computer vision | 2016
Himalaya Jain; Patrick Pérez; Rémi Gribonval; Joaquin Zepeda; Hervé Jégou
This paper tackles the task of storing a large collection of vectors, such as visual descriptors, and of searching in it. To this end, we propose to approximate database vectors by constrained sparse coding, where possible atom weights are restricted to belong to a finite subset. This formulation encompasses, as particular cases, previous state-of-the-art methods such as product or residual quantization. As opposed to traditional sparse coding methods, quantized sparse coding includes memory usage as a design constraint, thereby allowing us to index a large collection such as the BIGANN billion-sized benchmark. Our experiments, carried out on standard benchmarks, show that our formulation leads to competitive solutions when considering different trade-offs between learning/coding time, index size and search quality.
international conference on computer vision | 2017
Himalaya Jain; Joaquin Zepeda; Patrick Pérez; Rémi Gribonval
For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and novel architectures ushered in by the deep learning revolution. We hence propose herein a novel method to make deep convolutional neural networks produce supervised, compact, structured binary codes for visual search. Our method makes use of a novel block-softmax nonlinearity and of batch-based entropy losses that together induce structure in the learned encodings. We show that our method outperforms state-of-the-art compact representations based on deep hashing or structured quantization in single and cross-domain category retrieval, instance retrieval and classification. We make our code and models publicly available online.
european conference on computer vision | 2016
Praveen Kulkarni; Frédéric Jurie; Joaquin Zepeda; Patrick Pérez; Louis Chevallier
The aggregation of image statistics – the so-called pooling step of image classification algorithms – as well as the construction of part-based models, are two distinct and well-studied topics in the literature. The former aims at leveraging a whole set of local descriptors that an image can contain (through spatial pyramids or Fisher vectors for instance) while the latter argues that only a few of the regions an image contains are actually useful for its classification. This paper bridges the two worlds by proposing a new pooling framework based on the discovery of useful parts involved in the pooling of local representations. The key contribution lies in a model integrating a boosted non-linear part classifier as well as a parametric soft-max pooling component, both trained jointly with the image classifier. The experimental validation shows that the proposed model not only consistently surpasses standard pooling approaches but also improves over state-of-the-art part-based models, on several different and challenging classification tasks.
british machine vision conference | 2015
Praveen Kulkarni; Joaquin Zepeda; Frédéric Jurie; Patrick Pérez; Louis Chevallier
We present a method that formulates the selection of the structure of a deep architecture as a penalized, discriminative learning problem. Up to now, the structure of deep architectures has been fixed by hand, and only the weights are learned using discriminative learning. Our work is a first attempt towards a more formal method of deep structure selection. We consider architectures consisting only of fully-connected layers, and our approach relies on diagonal matrices inserted between subsequent layers. By including an L1 norm of the diagonal entries of said matrices as a regularization penalty, we force the diagonals to be sparse, accordingly selecting the effective number of rows (respectively, columns) of the corresponding layers (next layers) weights matrix. We carry out experiments on a standard dataset and show that our method succeeds in selecting the structure of deep architectures of multiple layers. One variant of our architecture results in a feature vector of size as little as
international conference on acoustics, speech, and signal processing | 2015
Praveen Kulkarni; Joaquin Zepeda; Frédéric Jurie; Patrick Pérez; Louis Chevallier
36
asian conference on computer vision | 2014
Aakanksha Rana; Joaquin Zepeda; Patrick Pérez
, while retaining very high image classification performance.
workshop on applications of computer vision | 2014
Praveen Kulkarni; Gaurav Sharma; Joaquin Zepeda; Louis Chevallier
Deep Convolutional Neural Networks (DCNN) have established a remarkable performance benchmark in the field of image classification, displacing classical approaches based on hand-tailored aggregations of local descriptors. Yet DCNNs impose high computational burdens both at training and at testing time, and training them requires collecting and annotating large amounts of training data. Supervised adaptation methods have been proposed in the literature that partially re-learn a transferred DCNN structure from a new target dataset. Yet these require expensive bounding-box annotations and are still computationally expensive to learn. In this paper, we address these shortcomings of DCNN adaptation schemes by proposing a hybrid approach that combines conventional, unsupervised aggregators such as Bag-of-Words (BoW), with the DCNN pipeline by treating the output of intermediate layers as densely extracted local descriptors. We test a variant of our approach that uses only intermediate DCNN layers on the standard PASCAL VOC 2007 dataset and show performance significantly higher than the standard BoW model and comparable to Fisher vector aggregation but with a feature that is 150 times smaller. A second variant of our approach that includes the fully connected DCNN layers significantly outperforms Fisher vector schemes and performs comparably to DCNN approaches adapted to Pascal VOC 2007, yet at only a small fraction of the training and testing cost.
european signal processing conference | 2016
Cagdas Bilen; Joaquin Zepeda; Patrick Pérez
In this paper we propose a generic framework for the optimization of image feature encoders for image retrieval. Our approach uses a triplet-based objective that compares, for a given query image, the similarity scores of an image with a matching and a non-matching image, penalizing triplets that give a higher score to the non-matching image. We use stochastic gradient descent to address the resulting problem and provide the required gradient expressions for generic encoder parameters, applying the resulting algorithm to learn the power normalization parameters commonly used to condition image features. We also propose a modification to codebook-based feature encoders that consists of weighting the local descriptors as a function of their distance to the assigned codeword before aggregating them as part of the encoding process. Using the VLAD feature encoder, we show experimentally that our proposed optimized power normalization method and local descriptor weighting method yield improvements on a standard dataset.
british machine vision conference | 2016
Xinrui Lyu; Joaquin Zepeda; Patrick Pérez
Retrieving images for an arbitrary user query, provided in textual form, is a challenging problem. A recently proposed method addresses this by constructing a visual classifier with images returned by an internet image search engine, based on the user query, as positive images while using a fixed pool of negative images. However, in practice, not all the images obtained from internet image search are always pertinent to the query; some might contain abstract or artistic representation of the content and some might have artifacts. Such images degrade the performance of on-the-fly constructed classifier. We propose a method for improving the performance of on-the-fly classifiers by using transfer learning via attributes. We first map the textual query to a set of known attributes and then use those attributes to prune the set of images downloaded from the internet. This pruning step can be seen as zero-shot learning of the visual classifier for the textual user query, which transfers knowledge from the attribute domain to the query domain. We also use the attributes along with the on-the-fly classifier to score the database images and obtain a hybrid ranking. We show interesting qualitative results and demonstrate by experiments with standard datasets that the proposed method improves upon the baseline on-the-fly classification system.
european signal processing conference | 2015
Joaquin Zepeda; Mehmet Turkan; Dominique Thoreau
Image retrieval in large image databases is an important problem that drives a number of applications. Yet the use of supervised approaches that address this problem has been limited due to the lack of large labeled datasets for training. Hence, in this paper we introduce two new datasets composed of images extracted from publicly available videos from the Cable News Network (CNN). The proposed datasets are particularly suited to supervised learning for image retrieval and are larger than any other existing dataset of a similar nature. The datasets are further provided with a set of pre-computed, state-of-the-art image feature vectors, as well as baseline results. In order to facilitate research in this important topic, we also detail a generic, supervised learning formulation for image retrieval and a related stochastic solver.