João F. Henriques
University of Coimbra
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by João F. Henriques.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015
João F. Henriques; Rui Caseiro; Pedro Martins; Jorge Batista
The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies—any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the discrete Fourier transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new kernelized correlation filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call dual correlation filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.
european conference on computer vision | 2012
João F. Henriques; Rui Caseiro; Pedro Martins; Jorge Batista
Recent years have seen greater interest in the use of discriminative classifiers in tracking systems, owing to their success in object detection. They are trained online with samples collected during tracking. Unfortunately, the potentially large number of samples becomes a computational burden, which directly conflicts with real-time requirements. On the other hand, limiting the samples may sacrifice performance. Interestingly, we observed that, as we add more and more samples, the problem acquires circulant structure. Using the well-established theory of Circulant matrices, we provide a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform. This can be done in the dual space of kernel machines as fast as with linear classifiers. We derive closed-form solutions for training and detection with several types of kernels, including the popular Gaussian and polynomial kernels. The resulting tracker achieves performance competitive with the state-of-the-art, can be implemented with only a few lines of code and runs at hundreds of frames-per-second. MATLAB code is provided in the paper (see Algorithm 1).
european conference on computer vision | 2016
Matej Kristan; Roman P. Pflugfelder; Aleš Leonardis; Jiri Matas; Luka Cehovin; Georg Nebehay; Tomas Vojir; Gustavo Fernández; Alan Lukezic; Aleksandar Dimitriev; Alfredo Petrosino; Amir Saffari; Bo Li; Bohyung Han; CherKeng Heng; Christophe Garcia; Dominik Pangersic; Gustav Häger; Fahad Shahbaz Khan; Franci Oven; Horst Bischof; Hyeonseob Nam; Jianke Zhu; Jijia Li; Jin Young Choi; Jin-Woo Choi; João F. Henriques; Joost van de Weijer; Jorge Batista; Karel Lebeda
Visual tracking has attracted a significant attention in the last few decades. The recent surge in the number of publications on tracking-related problems have made it almost impossible to follow the developments in the field. One of the reasons is that there is a lack of commonly accepted annotated data-sets and standardized evaluation protocols that would allow objective comparison of different tracking methods. To address this issue, the Visual Object Tracking (VOT) workshop was organized in conjunction with ICCV2013. Researchers from academia as well as industry were invited to participate in the first VOT2013 challenge which aimed at single-object visual trackers that do not apply pre-learned models of object appearance (model-free). Presented here is the VOT2013 benchmark dataset for evaluation of single-object visual trackers as well as the results obtained by the trackers competing in the challenge. In contrast to related attempts in tracker benchmarking, the dataset is labeled per-frame by visual attributes that indicate occlusion, illumination change, motion change, size change and camera motion, offering a more systematic comparison of the trackers. Furthermore, we have designed an automated system for performing and evaluating the experiments. We present the evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset. The dataset, the evaluation tools and the tracker rankings are publicly available from the challenge website (http://votchallenge.net).
european conference on computer vision | 2016
Luca Bertinetto; Jack Valmadre; João F. Henriques; Andrea Vedaldi; Philip H. S. Torr
The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object’s appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.
international conference on computer vision | 2011
João F. Henriques; Rui Caseiro; Jorge Batista
Multiple object tracking has been formulated recently as a global optimization problem, and solved efficiently with optimal methods such as the Hungarian Algorithm. A severe limitation is the inability to model multiple objects that are merged into a single measurement, and track them as a group, while retaining optimality. This work presents a new graph structure that encodes these multiple-match events as standard one-to-one matches, allowing computation of the solution in polynomial time. Since identities are lost when objects merge, an efficient method to identify groups is also presented, as a flow circulation problem. The problem of tracking individual objects across groups is then posed as a standard optimal assignment. Experiments show increased performance on the PETS 2006 and 2009 datasets compared to state-of-the-art algorithms.
international conference on computer vision | 2013
João F. Henriques; Joao Carreira; Rui Caseiro; Jorge Batista
Competitive sliding window detectors require vast training sets. Since a pool of natural images provides a nearly endless supply of negative samples, in the form of patches at different scales and locations, training with all the available data is considered impractical. A staple of current approaches is hard negative mining, a method of selecting relevant samples, which is nevertheless expensive. Given that samples at slightly different locations have overlapping support, there seems to be an enormous amount of duplicated work. It is natural, then, to ask whether these redundancies can be eliminated. In this paper, we show that the Gram matrix describing such data is block-circulant. We derive a transformation based on the Fourier transform that block-diagonalizes the Gram matrix, at once eliminating redundancies and partitioning the learning problem. This decomposition is valid for any dense features and several learning algorithms, and takes full advantage of modern parallel architectures. Surprisingly, it allows training with all the potential samples in sets of thousands of images. By considering the full set, we generate in a single shot the optimal solution, which is usually obtained only after several rounds of hard negative mining. We report speed gains on Caltech Pedestrians and INRIA Pedestrians of over an order of magnitude, allowing training on a desktop computer in a couple of minutes.
computer vision and pattern recognition | 2017
Jack Valmadre; Luca Bertinetto; João F. Henriques; Andrea Vedaldi; Philip H. S. Torr
The Correlation Filter is an algorithm that trains a linear template to discriminate between images and their translations. It is well suited to object tracking because its formulation in the Fourier domain provides a fast solution, enabling the detector to be re-trained once per frame. Previous works that use the Correlation Filter, however, have adopted features that were either manually designed or trained for a different task. This work is the first to overcome this limitation by interpreting the Correlation Filter learner, which has a closed-form solution, as a differentiable layer in a deep neural network. This enables learning deep features that are tightly coupled to the Correlation Filter. Experiments illustrate that our method has the important practical benefit of allowing lightweight architectures to achieve state-of-the-art performance at high framerates.
computer vision and pattern recognition | 2015
Rui Caseiro; João F. Henriques; Pedro Martins; Jorge Batista
Recently, a particular paradigm [18] in the domain adaptation field has received considerable attention by introducing novel and important insights to the problem. In this case, the source and target domains are represented in the form of subspaces, which are treated as points on the Grassmann manifold. The geodesic curve between them is sampled to obtain intermediate points. Then a classifier is learnt using the projections of the data onto these subspaces. Despite its relevance and popularity, this paradigm [18] contains some limitations. Firstly, in real-world applications, that simple curve (i.e. shortest path) does not provide the necessary flexibility to model the domain shift between the training and testing data sets. Secondly, by using the geodesic curve, we are restricted to only one source domain, which does not allow to take fully advantage of the multiple datasets that are available nowadays. It is then, natural to ask whether this popular concept could be extended to deal with more complex curves and to integrate multi-sources domains. This is a hard problem considering the Riemannian structure of the space, but we propose a mathematically well-founded idea that enables us to solve it. We exploit the geometric insight of rolling maps [30] to compute a spline curve on the Grassmann manifold. The benefits of the proposed idea are demonstrated through several empirical studies on standard datasets. This novel concept allows to explicitly integrate multi-source domains while the previous one [18] uses the mean of all sources. This enables to model better the domain shift and take fully advantage of the training datasets.
european conference on computer vision | 2012
Pedro Martins; Rui Caseiro; João F. Henriques; Jorge Batista
This work presents a simple and very efficient solution to align facial parts in unseen images. Our solution relies on a Point Distribution Model (PDM) face model and a set of discriminant local detectors, one for each facial landmark. The patch responses can be embedded into a Bayesian inference problem, where the posterior distribution of the global warp is inferred in a
computer vision and pattern recognition | 2013
Rui Caseiro; Pedro Martins; João F. Henriques; Fátima Silva Leite; Jorge Batista
#305;maximum a posteriori (MAP) sense. However, previous formulations do not model explicitly the covariance of the latent variables, which represents the confidence in the current solution. In our Discriminative Bayesian Active Shape Model (DBASM) formulation, the MAP global alignment is inferred by a Linear Dynamical System (LDS) that takes this information into account. The Bayesian paradigm provides an effective fitting strategy, since it combines in the same framework both the shape prior and multiple sets of patch alignment classifiers to further improve the accuracy. Extensive evaluations were performed on several datasets including the challenging Labeled Faces in the Wild (LFW). Face parts descriptors were also evaluated, including the recently proposed Minimum Output Sum of Squared Error (MOSSE) filter. The proposed Bayesian optimization strategy improves on the state-of-the-art while using the same local detectors. We also show that MOSSE filters further improve on these results.