Gaowen Liu
University of Trento
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gaowen Liu.
IEEE Transactions on Image Processing | 2014
Yan Yan; Elisa Ricci; Ramanathan Subramanian; Gaowen Liu; Nicu Sebe
Robust action recognition under viewpoint changes has received considerable attention recently. To this end, self-similarity matrices (SSMs) have been found to be effective view-invariant action descriptors. To enhance the performance of SSM-based methods, we propose multitask linear discriminant analysis (LDA), a novel multitask learning framework for multiview action recognition that allows for the sharing of discriminative SSM features among different views (i.e., tasks). Inspired by the mathematical connection between multivariate linear regression and LDA, we model multitask multiclass LDA as a single optimization problem by choosing an appropriate class indicator matrix. In particular, we propose two variants of graph-guided multitask LDA: 1) where the graph weights specifying view dependencies are fixed a priori and 2) where graph weights are flexibly learnt from the training data. We evaluate the proposed methods extensively on multiview RGB and RGBD video data sets, and experimental results confirm that the proposed approaches compare favorably with the state-of-the-art.
IEEE Transactions on Image Processing | 2015
Yan Yan; Yi Yang; Deyu Meng; Gaowen Liu; Wei Tong; Alexander G. Hauptmann; Nicu Sebe
Complex event detection is a retrieval task with the goal of finding videos of a particular event in a large-scale unconstrained Internet video archive, given example videos and text descriptions. Nowadays, different multimodal fusion schemes of low-level and high-level features are extensively investigated and evaluated for the complex event detection task. However, how to effectively select the high-level semantic meaningful concepts from a large pool to assist complex event detection is rarely studied in the literature. In this paper, we propose a novel strategy to automatically select semantic meaningful concepts for the event detection task based on both the events-kit text descriptions and the concepts high-level feature descriptions. Moreover, we introduce a novel event oriented dictionary representation based on the selected semantic concepts. Toward this goal, we leverage training images (frames) of selected concepts from the semantic indexing dataset with a pool of 346 concepts, into a novel supervised multitask ℓp-norm dictionary learning framework. Extensive experimental results on TRECVID multimedia event detection dataset demonstrate the efficacy of our proposed method.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016
Yan Yan; Elisa Ricci; Ramanathan Subramanian; Gaowen Liu; Oswald Lanz; Nicu Sebe
Recently, head pose estimation (HPE) from low-resolution surveillance data has gained in importance. However, monocular and multi-view HPE approaches still work poorly under target motion, as facial appearance distorts owing to camera perspective and scale changes when a person moves around. To this end, we propose FEGA-MTL, a novel framework based on Multi-Task Learning (MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. In the learning phase, guided by two graphs which a-priori model the similarity among (1) grid partitions based on camera geometry and (2) head pose classes, FEGA-MTL derives the optimal scene partitioning and associated pose classifiers. Upon determining the targets position using a person tracker at test time, the corresponding region-specific classifier is invoked for HPE. The FEGA-MTL framework naturally extends to a weakly supervised setting where the targets walking direction is employed as a proxy in lieu of head orientation. Experiments confirm that FEGA-MTL significantly outperforms competing single-task and multi-task learning methods in multi-view settings.
international conference on image processing | 2013
Yan Yan; Gaowen Liu; Elisa Ricci; Nicu Sebe
Action recognition is a central problem in many practical applications, such as video annotation, video surveillance and human-computer interaction. Most action recognition approaches are currently based on localized spatio-temporal features that can vary significantly when the viewpoint changes. Therefore, the performance rapidly drops when training and test data correspond to different cameras/viewpoints. Recently, Self-Similarity Matrix (SSM) features have been introduced to circumvent this problem. To improve the performance of current SSM-based methods, in this paper we propose a multi-task learning framework for multi-view action recognition where discriminative SSM features are shared among different views. Inspired by the mathematical connection between multivariate linear regression and Linear Discriminant Analysis (LDA), we propose a novel learning algorithm, where a single optimization framework is defined for multi-task multi-class LDA by choosing an appropriate class indicator matrix. Experimental results on the popular IXMAS dataset demonstrate that our approach achieves accurate performance and compares favorably with state-of-the-art methods.
Computer Vision and Image Understanding | 2014
Yan Yan; Haoquan Shen; Gaowen Liu; Zhigang Ma; Chenqiang Gao; Nicu Sebe
Abstract The selection of discriminative features is an important and effective technique for many computer vision and multimedia tasks. Using irrelevant features in classification or clustering tasks could deteriorate the performance. Thus, designing efficient feature selection algorithms to remove the irrelevant features is a possible way to improve the classification or clustering performance. With the successful usage of sparse models in image and video classification and understanding, imposing structural sparsity in feature selection has been widely investigated during the past years. Motivated by the merit of sparse models, in this paper we propose a novel feature selection method using a sparse model. Different from the state of the art, our method is built upon l 2 , p -norm and simultaneously considers both the global and local (GLocal) structures of data distribution. Our method is more flexible in selecting the discriminating features as it is able to control the degree of sparseness. Moreover, considering both global and local structures of data distribution makes our feature selection process more effective. An efficient algorithm is proposed to solve the l 2 , p -norm joint sparsity optimization problem in this paper. Experimental results performed on real-world image and video datasets show the effectiveness of our feature selection method compared to several state-of-the-art methods.
international conference on multimedia retrieval | 2014
Chenqiang Gao; Deyu Meng; Wei Tong; Yi Yang; Yang Cai; Haoquan Shen; Gaowen Liu; Shicheng Xu; Alexander G. Hauptmann
Event detection from real surveillance videos with complicated background environment is always a very hard task. Different from the traditional retrospective and interactive systems designed on this task, which are mainly executed on video fragments located within the event-occurrence time, in this paper we propose a new interactive system constructed on the mid-level discriminative representations (patches/shots) which are closely related to the event (might occur beyond the event-occurrence period) and are easier to be detected than video fragments. By virtue of such easily-distinguished mid-level patterns, our framework realizes an effective labor division between computers and human participants. The task of computers is to train classifiers on a bunch of mid-level discriminative representations, and to sort all the possible mid-level representations in the evaluation sets based on the classifier scores. The task of human participants is then to readily search the events based on the clues offered by these sorted mid-level representations. For computers, such mid-level representations, with more concise and consistent patterns, can be more accurately detected than video fragments utilized in the conventional framework, and on the other hand, a human participant can always much more easily search the events of interest implicated by these location-anchored mid-level representations than conventional video fragments containing entire scenes. Both of these two properties facilitate the availability of our framework in real surveillance event detection applications.
acm multimedia | 2013
Yan Yan; Zhongwen Xu; Gaowen Liu; Zhigang Ma; Nicu Sebe
The selection of discriminative features is an important and effective technique for many multimedia tasks. Using irrelevant features in classification or clustering tasks could deteriorate the performance. Thus, designing efficient feature selection algorithms to remove the irrelevant features is a possible way to improve the classification or clustering performance. With the successful usage of sparse models in image and video classification and understanding, imposing structural sparsity in \emph{feature selection} has been widely investigated during the past years. Motivated by the merit of sparse models, we propose a novel feature selection method using a sparse model in this paper. Different from the state of the art, our method is built upon
asian conference on computer vision | 2014
Yan Yan; Elisa Ricci; Gaowen Liu; Nicu Sebe
\ell _{2,p}
international conference on pattern recognition | 2014
Yan Yan; Elisa Ricci; Gaowen Liu; Ramanathan Subramanian; Nicu Sebe
-norm and simultaneously considers both the global and local (GLocal) structures of data distribution. Our method is more flexible in selecting the discriminating features as it is able to control the degree of sparseness. Moreover, considering both global and local structures of data distribution makes our feature selection process more effective. An efficient algorithm is proposed to solve the
World Wide Web | 2016
Gaowen Liu; Yan Yan; Ramanathan Subramanian; Jingkuan Song; Guoyu Lu; Nicu Sebe
\ell_{2,p}