Is this you? Create Your Porfile

Jiaolong Xu

Autonomous University of Barcelona

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jiaolong Xu is active.

Explore More

Publication

Featured researches published by Jiaolong Xu.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Domain Adaptation of Deformable Part-Based Models.

Jiaolong Xu; Sebastian Ramos; David Vázquez; Antonio M. López

The accuracy of object classifiers can significantly drop when the training data (source domain) and the application scenario (target domain) have inherent differences. Therefore, adapting the classifiers to the scenario in which they must operate is of paramount importance. We present novel domain adaptation (DA) methods for object detection. As proof of concept, we focus on adapting the state-of-the-art deformable part-based model (DPM) for pedestrian detection. We introduce an adaptive structural SVM (A-SSVM) that adapts a pre-learned classifier between different domains. By taking into account the inherent structure in feature space (e.g., the parts in a DPM), we propose a structure-aware A-SSVM (SA-SSVM). Neither A-SSVM nor SA-SSVM needs to revisit the source-domain training data to perform the adaptation. Rather, a low number of target-domain training examples (e.g., pedestrians) are used. To address the scenario where there are no target-domain annotated samples, we propose a self-adaptive DPM based on a self-paced learning (SPL) strategy and a Gaussian Process Regression (GPR). Two types of adaptation tasks are assessed: from both synthetic pedestrians and general persons (PASCAL VOC) to pedestrians imaged from an on-board camera. Results show that our proposals avoid accuracy drops as high as 15 points when comparing adapted and non-adapted detectors.

IEEE Transactions on Intelligent Transportation Systems | 2014

Learning a Part-Based Pedestrian Detector in a Virtual World

Jiaolong Xu; David Vázquez; Antonio M. López; Javier Marín; Daniel Ponsa

Detecting pedestrians with on-board vision systems is of paramount interest for assisting drivers to prevent vehicle-to-pedestrian accidents. The core of a pedestrian detector is its classification module, which aims at deciding if a given image window contains a pedestrian. Given the difficulty of this task, many classifiers have been proposed during the last 15 years. Among them, the so-called (deformable) part-based classifiers, including multiview modeling, are usually top ranked in accuracy. Training such classifiers is not trivial since a proper aspect clustering and spatial part alignment of the pedestrian training samples are crucial for obtaining an accurate classifier. In this paper, we first perform automatic aspect clustering and part alignment by using virtual-world pedestrians, i.e., human annotations are not required. Second, we use a mixture-of-parts approach that allows part sharing among different aspects. Third, these proposals are integrated in a learning framework, which also allows incorporating real-world training data to perform domain adaptation between virtual- and real-world cameras. Overall, the obtained results on four popular on-board data sets show that our proposal clearly outperforms the state-of-the-art deformable part-based detector known as latent support vector machine.

Sensors | 2016

Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison

Alejandro González; Zhijie Fang; Yainuvis Socarras; Joan Serrat; David Vázquez; Jiaolong Xu; Antonio M. López

Despite all the significant advances in pedestrian detection brought by computer vision for driving assistance, it is still a challenging problem. One reason is the extremely varying lighting conditions under which such a detector should operate, namely day and nighttime. Recent research has shown that the combination of visible and non-visible imaging modalities may increase detection accuracy, where the infrared spectrum plays a critical role. The goal of this paper is to assess the accuracy gain of different pedestrian models (holistic, part-based, patch-based) when training with images in the far infrared spectrum. Specifically, we want to compare detection accuracy on test images recorded at day and nighttime if trained (and tested) using (a) plain color images; (b) just infrared images; and (c) both of them. In order to obtain results for the last item, we propose an early fusion approach to combine features from both modalities. We base the evaluation on a new dataset that we have built for this purpose as well as on the publicly available KAIST multispectral dataset.

ieee intelligent vehicles symposium | 2015

Multiview random forest of local experts combining RGB and LIDAR data for pedestrian detection

Alejandro González; Gabriel Villalonga; Jiaolong Xu; David Vázquez; Jaume Amores; Antonio M. López

Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multi-modality and strong multi-view classifier) affect performance both individually and when integrated together. In the multi-modality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.

International Journal of Computer Vision | 2016

Hierarchical Adaptive Structural SVM for Domain Adaptation

Jiaolong Xu; Sebastian Ramos; David Vázquez; Antonio M. López

A key topic in classification is the accuracy loss produced when the data distribution in the training (source) domain differs from that in the testing (target) domain. This is being recognized as a very relevant problem for many computer vision tasks such as image classification, object detection, and object category recognition. In this paper, we present a novel domain adaptation method that leverages multiple target domains (or sub-domains) in a hierarchical adaptation tree. The core idea is to exploit the commonalities and differences of the jointly considered target domains. Given the relevance of structural SVM (SSVM) classifiers, we apply our idea to the adaptive SSVM (A-SSVM; Xu et al., IEEE Trans Pattern Anal Mach Intell 36(12):2367–2380, 2014a), which only requires the target domain samples together with the existing source-domain classifier for performing the desired adaptation. Altogether, we term our proposal as hierarchical A-SSVM (HA-SSVM). As proof of concept we use HA-SSVM for pedestrian detection, object category recognition and face recognition. In the former we apply HA-SSVM to the deformable part-based model (DPM; Felzenszwalb et al., IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645, 2010) while in the rest HA-SSVM is applied to multi-category classifiers. We will show how HA-SSVM is effective in increasing the detection/recognition accuracy with respect to adaptation strategies that ignore the structure of the target data. Since, the sub-domains of the target data are not always known a priori, we shown how HA-SSVM can incorporate sub-domain discovery for object category recognition.

IEEE Transactions on Image Processing | 2015

Recognizing Actions Through Action-Specific Person Detection

Fahad Shahbaz Khan; Jiaolong Xu; Joost van de Weijer; Andrew D. Bagdanov; Rao Muhammad Anwer; Antonio M. López

Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test time, outperforms on both data sets state-of-the-art methods, which do use person locations.

british machine vision conference | 2014

Incremental Domain Adaptation of Deformable Part-based Models.

Jiaolong Xu; Sebastian Ramos; David Vázquez; Antonio M. López

Nowadays, classifiers play a core role in many computer vision tasks. The underlying assumption for learning classifiers is that the training set and the deployment environment (testing) follow the same probability distribution regarding the features used by the classifiers. However, in practice, there are different reasons that can break this constancy assumption. Accordingly, reusing existing classifiers by adapting them from the previous training environment (source domain) to the new testing one (target domain) is an approach with increasing acceptance in the computer vision community. In this paper we focus on the domain adaptation of deformable part-based models (DPMs) for object detection. In particular, we focus on a relatively unexplored scenario, i.e. incremental domain adaptation for object detection assuming weak-labeling. Therefore, our algorithm is ready to improve existing source-oriented DPM-based detectors as soon as a little amount of labeled target-domain training data is available, and keeps improving as more of such data arrives in a continuous fashion. For achieving this, we follow a multiple instance learning (MIL) paradigm that operates in an incremental per-image basis. As proof of concept, we address the challenging scenario of adapting a DPM-based pedestrian detector trained with synthetic pedestrians to operate in real-world scenarios. The obtained results show that our incremental adaptive models obtain equally good accuracy results as the batch learned models, while being more flexible for handling continuously arriving target-domain data.

ieee intelligent vehicles symposium | 2013

Learning a multiview part-based model in virtual world for pedestrian detection

Jiaolong Xu; David Vázquez; Antonio M. López; Javier Marin; Daniel Ponsa

State-of-the-art deformable part-based models based on latent SVM have shown excellent results on human detection. In this paper, we propose to train a multiview deformable part-based model with automatically generated part examples from virtual-world data. The method is efficient as: (i) the part detectors are trained with precisely extracted virtual examples, thus no latent learning is needed, (ii) the multiview pedestrian detector enhances the performance of the pedestrian root model, (iii) a top-down approach is used for part detection which reduces the searching space. We evaluate our model on Daimler and Karlsruhe Pedestrian Benchmarks with publicly available Caltech pedestrian detection evaluation framework and the result outperforms the state-of-the-art latent SVM V4.0, on both average miss rate and speed (our detector is ten times faster).

arXiv: Computer Vision and Pattern Recognition | 2017

From Virtual to Real World Visual Perception Using Domain Adaptation—The DPM as Example

Antonio M. López; Jiaolong Xu; Jose Luis Gomez; David Vázquez; German Ros

Supervised learning tends to produce more accurate classifiers than unsupervised learning in general. This implies that training data is preferred with annotations. When addressing visual perception challenges, such as localizing certain object classes within an image, the learning of the involved classifiers turns out to be a practical bottleneck. The reason is that, at least, we have to frame object examples with bounding boxes in thousands of images. A priori, the more complex the model is regarding its number of parameters, the more annotated examples are required. This annotation task is performed by human oracles, which ends up in inaccuracies and errors in the annotations (aka ground truth) since the task is inherently very cumbersome and sometimes ambiguous. As an alternative, we have pioneered the use of virtual worlds for collecting such annotations automatically and with high precision. However, since the models learned with virtual data must operate in the real world, we still need to perform domain adaptation (DA). In this chapter, we revisit the DA of a Deformable Part-Based Model (DPM) as an exemplifying case of virtual- to real-world DA. As a use case, we address the challenge of vehicle detection for driver assistance, using different publicly available virtual-world data. While doing so, we investigate questions such as how does the domain gap behave due to virtual-vs-real data with respect to dominant object appearance per domain, as well as the role of photo-realism in the virtual world .

computer vision and pattern recognition | 2013

Weakly Supervised Automatic Annotation of Pedestrian Bounding Boxes

David Vázquez; Jiaolong Xu; Sebastian Ramos; Antonio M. López; Daniel Ponsa

Among the components of a pedestrian detector, its trained pedestrian classifier is crucial for achieving the desired performance. The initial task of the training process consists in collecting samples of pedestrians and background, which involves tiresome manual annotation of pedestrian bounding boxes (BBs). Thus, recent works have assessed the use of automatically collected samples from photo-realistic virtual worlds. However, learning from virtual-world samples and testing in real-world images may suffer the dataset shift problem. Accordingly, in this paper we assess an strategy to collect samples from the real world and retrain with them, thus avoiding the dataset shift, but in such a way that no BBs of real-world pedestrians have to be provided. In particular, we train a pedestrian classifier based on virtual-world samples (no human annotation required). Then, using such a classifier we collect pedestrian samples from real-world images by detection. After, a human oracle rejects the false detections efficiently (weak annotation). Finally, a new classifier is trained with the accepted detections. We show that this classifier is competitive with respect to the counterpart trained with samples collected by manually annotating hundreds of pedestrian BBs.

Explore More