Jasper R. R. Uijlings

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jasper R. R. Uijlings is active.

Explore More

Publication

Featured researches published by Jasper R. R. Uijlings.

IEEE Transactions on Multimedia | 2010

Real-Time Visual Concept Classification

Jasper R. R. Uijlings; Arnold W. M. Smeulders; Remko Scha

As datasets grow increasingly large in content-based image and video retrieval, computational efficiency of concept classification is important. This paper reviews techniques to accelerate concept classification, where we show the trade-off between computational efficiency and accuracy. As a basis, we use the Bag-of-Words algorithm that in the 2008 benchmarks of TRECVID and PASCAL lead to the best performance scores. We divide the evaluation in three steps: 1) Descriptor Extraction, where we evaluate SIFT, SURF, DAISY, and Semantic Textons. 2) Visual Word Assignment, where we compare a k-means visual vocabulary with a Random Forest and evaluate subsampling, dimension reduction with PCA, and division strategies of the Spatial Pyramid. 3) Classification, where we evaluate the χ2, RBF, and Fast Histogram Intersection kernel for the SVM. Apart from the evaluation, we accelerate the calculation of densely sampled SIFT and SURF, accelerate nearest neighbor assignment, and improve accuracy of the Histogram Intersection kernel. We conclude by discussing whether further acceleration of the Bag-of-Words pipeline is possible. Our results lead to a 7-fold speed increase without accuracy loss, and a 70-fold speed increase with 3% accuracy loss. The latter system does classification in real-time, which opens up new applications for automatic concept classification. For example, this system permits five standard desktop PCs to automatically tag for 20 classes all images that are currently uploaded to Flickr.

IEEE Transactions on Multimedia | 2012

Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection

Zhigang Ma; Feiping Nie; Yi Yang; Jasper R. R. Uijlings; Nicu Sebe

The number of web images has been explosively growing due to the development of network and storage technology. These images make up a large amount of current multimedia data and are closely related to our daily life. To efficiently browse, retrieve and organize the web images, numerous approaches have been proposed. Since the semantic concepts of the images can be indicated by label information, automatic image annotation becomes one effective technique for image management tasks. Most existing annotation methods use image features that are often noisy and redundant. Hence, feature selection can be exploited for a more precise and compact representation of the images, thus improving the annotation performance. In this paper, we propose a novel feature selection method and apply it to automatic image annotation. There are two appealing properties of our method. First, it can jointly select the most relevant features from all the data points by using a sparsity-based model. Second, it can uncover the shared subspace of original features, which is beneficial for multi-label learning. To solve the objective function of our method, we propose an efficient iterative algorithm. Extensive experiments are performed on large image databases that are collected from the web. The experimental results together with the theoretical analysis have validated the effectiveness of our method for feature selection, thus demonstrating its feasibility of being applied to web image annotation.

conference on image and video retrieval | 2009

Real-time bag of words, approximately

Jasper R. R. Uijlings; Arnold W. M. Smeulders; Remko Scha

We start from the state-of-the-art Bag of Words pipeline that in the 2008 benchmarks of TRECvid and PASCAL yielded the best performance scores. We have contributed to that pipeline, which now forms the basis to compare various fast alternatives for all of its components: (i) For descriptor extraction we propose a fast algorithm to densely sample SIFT and SURF, and we compare several variants of these descriptors. (ii) For descriptor projection we compare a k-means visual vocabulary with a Random Forest. As a preprojection step we experiment with PCA on the descriptors to decrease projection time. (iii) For classification we use Support Vector Machines and compare the x2 kernel with the RBF kernel. Our results lead to a 10-fold speed increase without any loss of accuracy and to a 30-fold speed increase with 17% loss of accuracy, where the latter system does real-time classification at 26 images per second.

IEEE Transactions on Multimedia | 2012

Discriminating Joint Feature Analysis for Multimedia Data Understanding

Zhigang Ma; Feiping Nie; Yi Yang; Jasper R. R. Uijlings; Nicu Sebe; Alexander G. Hauptmann

In this paper, we propose a novel semi-supervised feature analyzing framework for multimedia data understanding and apply it to three different applications: image annotation, video concept detection and 3-D motion data analysis. Our method is built upon two advancements of the state of the art: (1) l2, 1-norm regularized feature selection which can jointly select the most relevant features from all the data points. This feature selection approach was shown to be robust and efficient in literature as it considers the correlation between different features jointly when conducting feature selection; (2) manifold learning which analyzes the feature space by exploiting both labeled and unlabeled data. It is a widely used technique to extend many algorithms to semi-supervised scenarios for its capability of leveraging the manifold structure of multimedia data. The proposed method is able to learn a classifier for different applications by selecting the discriminating features closely related to the semantic concepts. The objective function of our method is non-smooth and difficult to solve, so we design an efficient iterative algorithm with fast convergence, thus making it applicable to practical applications. Extensive experiments on image annotation, video concept detection and 3-D motion data analysis are performed on different real-world data sets to demonstrate the effectiveness of our algorithm.

acm multimedia | 2011

Exploiting the entire feature space with sparsity for automatic image annotation

Zhigang Ma; Yi Yang; Feiping Nie; Jasper R. R. Uijlings; Nicu Sebe

The explosive growth of digital images requires effective methods to manage these images. Among various existing methods, automatic image annotation has proved to be an important technique for image management tasks, e.g., image retrieval over large-scale image databases. Automatic image annotation has been widely studied during recent years and a considerable number of approaches have been proposed. However, the performance of these methods is yet to be satisfactory, thus demanding more effort on research of image annotation. In this paper, we propose a novel semi supervised framework built upon feature selection for automatic image annotation. Our method aims to jointly select the most relevant features from all the data points by using a sparsity-based model and exploiting both labeled and unlabeled data to learn the manifold structure. Our framework is able to simultaneously learn a robust classifier for image annotation by selecting the discriminating features related to the semantic concepts. To solve the objective function of our framework, we propose an efficient iterative algorithm. Extensive experiments are performed on different real-world image datasets with the results demonstrating the promising performance of our framework for automatic image annotation.

computer vision and pattern recognition | 2009

What is the spatial extent of an object

Jasper R. R. Uijlings; Arnold W. M. Smeulders; Remko Scha

This paper discusses the question: Can we improve the recognition of objects by using their spatial context? We start from Bag-of-Words models and use the Pascal 2007 dataset. We use the rough object bounding boxes that come with this dataset to investigate the fundamental gain context can bring. Our main contributions are: (I) The result of Zhang et al. in CVPR07 that context is superfluous derived from the Pascal 2005 data set of 4 classes does not generalize to this dataset. For our larger and more realistic dataset context is important indeed. (II) Using the rough bounding box to limit or extend the scope of an object during both training and testing, we find that the spatial extent of an object is determined by its category: (a) well-defined, rigid objects have the object itself as the preferred spatial extent. (b) Non-rigid objects have an unbounded spatial extent : all spatial extents produce equally good results. (c) Objects primarily categorised based on their function have the whole image as their spatial extent. Finally, (III) using the rough bounding box to treat object and context separately, we find that the upper bound of improvement is 26% (12% absolute) in terms of mean average precision, and this bound is likely to be higher if the localisation is done using segmentation. It is concluded that object localisation, if done sufficiently precise, helps considerably in the recognition of objects for the Pascal 2007 dataset.

acm multimedia | 2012

In the eye of the beholder: employing statistical analysis and eye tracking for analyzing abstract paintings

Victoria Yanulevskaya; Jasper R. R. Uijlings; Elia Bruni; Andreza Sartori; Elisa Zamboni; Francesca Bacci; David Melcher; Nicu Sebe

Most artworks are explicitly created to evoke a strong emotional response. During the centuries there were several art movements which employed different techniques to achieve emotional expressions conveyed by artworks. Yet people were always consistently able to read the emotional messages even from the most abstract paintings. Can a machine learn what makes an artwork emotional? In this work, we consider a set of 500 abstract paintings from Museum of Modern and Contemporary Art of Trento and Rovereto (MART), where each painting was scored as carrying a positive or negative response on a Likert scale of 1-7. We employ a state-of-the-art recognition system to learn which statistical patterns are associated with positive and negative emotions. Additionally, we dissect the classification machinery to determine which parts of an image evokes what emotions. This opens new opportunities to research why a specific painting is perceived as emotional. We also demonstrate how quantification of evidence for positive and negative emotions can be used to predict the way in which people observe paintings.

International Journal of Multimedia Information Retrieval | 2015

Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off

Jasper R. R. Uijlings; Ionut C. Duta; Enver Sangineto; Nicu Sebe

The current state-of-the-art in video classification is based on Bag-of-Words using local visual descriptors. Most commonly these are histogram of oriented gradients (HOG), histogram of optical flow (HOF) and motion boundary histograms (MBH) descriptors. While such approach is very powerful for classification, it is also computationally expensive. This paper addresses the problem of computational efficiency. Specifically: (1) We propose several speed-ups for densely sampled HOG, HOF and MBH descriptors and release Matlab code; (2) We investigate the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method; (3) We investigate the trade-off between accuracy and computational efficiency for computing the feature vocabulary, using and comparing most of the commonly adopted vector quantization techniques:

international conference on multimedia retrieval | 2014