Boqing Gong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Boqing Gong is active.

Explore More

Publication

Featured researches published by Boqing Gong.

computer vision and pattern recognition | 2012

Geodesic flow kernel for unsupervised domain adaptation

Boqing Gong; Yuan Shi; Fei Sha; Kristen Grauman

In real-world applications of visual recognition, many factors - such as pose, illumination, or image quality - can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform poorly on the target domain. Domain adaptation techniques aim to correct the mismatch. Existing approaches have concentrated on learning feature representations that are invariant across domains, and they often do not directly exploit low-dimensional structures that are intrinsic to many vision datasets. In this paper, we propose a new kernel-based method that takes advantage of such structures. Our geodesic flow kernel models domain shift by integrating an infinite number of subspaces that characterize changes in geometric and statistical properties from the source to the target domain. Our approach is computationally advantageous, automatically inferring important algorithmic parameters without requiring extensive cross-validation or labeled data from either domain. We also introduce a metric that reliably measures the adaptability between a pair of source and target domains. For a given target domain and several source domains, the metric can be used to automatically select the optimal source domain to adapt and avoid less desirable ones. Empirical studies on standard datasets demonstrate the advantages of our approach over competing methods.

acm multimedia | 2009

Automatic facial expression recognition on a single 3D face by exploring shape deformation

Boqing Gong; Yueming Wang; Jianzhuang Liu; Xiaoou Tang

Facial expression recognition has many applications in multimedia processing and the development of 3D data acquisition techniques makes it possible to identify expressions using 3D shape information. In this paper, we propose an automatic facial expression recognition approach based on a single 3D face. The shape of an expressional 3D face is approximated as the sum of two parts, a basic facial shape component (BFSC) and an expressional shape component (ESC). The BFSC represents the basic face structure and neutral-style shape and the ESC contains shape changes caused by facial expressions. To separate the BFSC and ESC, our method firstly builds a reference face for each input 3D non-neutral face by a learning method, which well represents the basic facial shape. Then, based on the BFSC and the original expressional face, a facial expression descriptor is designed. The surface depth changes are considered in the descriptor. Finally, the descriptor is input into an SVM to recognize the expression. Unlike previous methods which recognize a facial expression with the help of manually labeled key points and/or a neutral face, our method works on a single 3D face without any manual assistance. Extensive experiments are carried out on the BU-3DFE database and comparisons with existing methods are conducted. The experimental results show the effectiveness of our method.

european conference on computer vision | 2016

An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild

Wei-Lun Chao; Soravit Changpinyo; Boqing Gong; Fei Sha

We investigate the problem of generalized zero-shot learning (GZSL). GZSL relaxes the unrealistic assumption in conventional zero-shot learning (ZSL) that test data belong only to unseen novel classes. In GZSL, test data might also come from seen classes and the labeling space is the union of both types of classes. We show empirically that a straightforward application of classifiers provided by existing ZSL approaches does not perform well in the setting of GZSL. Motivated by this, we propose a surprisingly simple but effective method to adapt ZSL approaches for GZSL. The main idea is to introduce a calibration factor to calibrate the classifiers for both seen and unseen classes so as to balance two conflicting forces: recognizing data from seen classes and those from unseen ones. We develop a new performance metric called the Area Under Seen-Unseen accuracy Curve to characterize this trade-off. We demonstrate the utility of this metric by analyzing existing ZSL approaches applied to the generalized setting. Extensive empirical studies reveal strengths and weaknesses of those approaches on three well-studied benchmark datasets, including the large-scale ImageNet with more than 20,000 unseen categories. We complement our comparative studies in learning methods by further establishing an upper bound on the performance limit of GZSL. In particular, our idea is to use class-representative visual features as the idealized semantic embeddings. We show that there is a large gap between the performance of existing approaches and the performance limit, suggesting that improving the quality of class semantic embeddings is vital to improving ZSL.

computer vision and pattern recognition | 2016

Learning Attributes Equals Multi-Source Domain Generalization

Chuang Gan; Tianbao Yang; Boqing Gong

Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval. Whereas the existing work mainly pursues utilizing attributes for various computer vision problems, we contend that the most basic problem-how to accurately and robustly detect attributes from images-has been left under explored. Especially, the existing work rarely explicitly tackles the need that attribute detectors should generalize well across different categories, including those previously unseen. Noting that this is analogous to the objective of multi-source domain generalization, if we treat each category as a domain, we provide a novel perspective to attribute detection and propose to gear the techniques in multi-source domain generalization for the purpose of learning cross-category generalizable attribute detectors. We validate our understanding and approach with extensive experiments on four challenging datasets and three different problems.

International Journal of Computer Vision | 2014

Learning Kernels for Unsupervised Domain Adaptation with Applications to Visual Object Recognition

Boqing Gong; Kristen Grauman; Fei Sha

Domain adaptation aims to correct the mismatch in statistical properties between the source domain on which a classifier is trained and the target domain to which the classifier is to be applied. In this paper, we address the challenging scenario of unsupervised domain adaptation, where the target domain does not provide any annotated data to assist in adapting the classifier. Our strategy is to learn robust features which are resilient to the mismatch across domains and then use them to construct classifiers that will perform well on the target domain. To this end, we propose novel kernel learning approaches to infer such features for adaptation. Concretely, we explore two closely related directions. In the first direction, we propose unsupervised learning of a geodesic flow kernel (GFK). The GFK summarizes the inner products in an infinite sequence of feature subspaces that smoothly interpolates between the source and target domains. In the second direction, we propose supervised learning of a kernel that discriminatively combines multiple base GFKs. Those base kernels model the source and the target domains at fine-grained granularities. In particular, each base kernel pivots on a different set of landmarks—the most useful data instances that reveal the similarity between the source and the target domains, thus bridging them to achieve adaptation. Our approaches are computationally convenient, automatically infer important hyper-parameters, and are capable of learning features and classifiers discriminatively without demanding labeled data from the target domain. In extensive empirical studies on standard benchmark recognition datasets, our appraches yield state-of-the-art results compared to a variety of competing methods.

IEEE Transactions on Multimedia | 2013

Learning Semantic Signatures for 3D Object Retrieval

Boqing Gong; Jianzhuang Liu; Xiaogang Wang; Xiaoou Tang

In this paper, we propose two kinds of semantic signatures for 3D object retrieval (3DOR). Humans are capable of describing an object using attribute terms like “symmetric” and “flyable”, or using its similarities to some known object classes. We convert such qualitative descriptions into attribute signature (AS) and reference set signature (RSS), respectively, and use them for 3DOR. We also show that AS and RSS can be understood as two different quantization methods of the same semantic space of human descriptions of objects. The advantages of the semantic signatures are threefold. First, they are much more compact than low-level shape features yet working with comparable retrieval accuracy. Therefore, the proposed semantic signatures require less storage space and computation cost in retrieval. Second, the high-level signatures are a good complement to low-level shape features. As a result, by incorporating the signatures we can improve the performance of state-of-the-art 3DOR methods by a large margin. To the best of our knowledge, we obtain the best results on two popular benchmarks. Third, the AS enables us to build a user-friendly interface, with which the user can trigger a search by simply clicking attribute bars instead of finding a 3D object as the query. This interface is of great significance in 3DOR considering the fact that while searching, the user usually does not have a 3D query at hand that is similar to his/her targeted objects in the database.

european conference on computer vision | 2016

Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames

Chuang Gan; Chen Sun; Lixin Duan; Boqing Gong

Video recognition usually requires a large amount of training samples, which are expensive to be collected. An alternative and cheap solution is to draw from the large-scale images and videos from the Web. With modern search engines, the top ranked images or videos are usually highly correlated to the query, implying the potential to harvest the labeling-free Web images and videos for video recognition. However, there are two key difficulties that prevent us from using the Web data directly. First, they are typically noisy and may be from a completely different domain from that of users’ interest (e.g. cartoons). Second, Web videos are usually untrimmed and very lengthy, where some query-relevant frames are often hidden in between the irrelevant ones. A question thus naturally arises: to what extent can such noisy Web images and videos be utilized for labeling-free video recognition? In this paper, we propose a novel approach to mutually voting for relevant Web images and video frames, where two forces are balanced, i.e. aggressive matching and passive video frame selection. We validate our approach on three large-scale video recognition datasets.

european conference on computer vision | 2016

Query-Focused Extractive Video Summarization

Aidean Sharghi; Boqing Gong; Mubarak Shah

Video data is explosively growing. As a result of the “big video data”, intelligent algorithms for automatic video summarization have (re-)emerged as a pressing need. We develop a probabilistic model, Sequential and Hierarchical Determinantal Point Process (SH-DPP), for query-focused extractive video summarization. Given a user query and a long video sequence, our algorithm returns a summary by selecting key shots from the video. The decision to include a shot in the summary depends on the shot’s relevance to the user query and importance in the context of the video, jointly. We verify our approach on two densely annotated video datasets. The query-focused video summarization is particularly useful for search engines, e.g., to display snippets of videos.

computer vision and pattern recognition | 2016

Fast Zero-Shot Image Tagging

Yang Zhang; Boqing Gong; Mubarak Shah

The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words. We study a particular image-word relevance relation in this paper. Our results show that the word vectors of relevant tags for a given image rank ahead of the irrelevant tags, along a principal direction in the word vector space. Inspired by this observation, we propose to solve image tagging by estimating the principal direction for an image. Particularly, we exploit linear mappings and nonlinear deep neural networks to approximate the principal direction from an input image. We arrive at a quite versatile tagging model. It runs fast given a test image, in constant time w.r.t. the training set size. It not only gives superior performance for the conventional tagging task on the NUSWIDE dataset, but also outperforms competitive baselines on annotating images with previously unseen tags.

computer vision and pattern recognition | 2017

Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach

Aidean Sharghi; Jacob S. Laurel; Boqing Gong

Recent years have witnessed a resurgence of interest in video summarization. However, one of the main obstacles to the research on video summarization is the user subjectivity — users have various preferences over the summaries. The subjectiveness causes at least two problems. First, no single video summarizer fits all users unless it interacts with and adapts to the individual users. Second, it is very challenging to evaluate the performance of a video summarizer. To tackle the first problem, we explore the recently proposed query-focused video summarization which introduces user preferences in the form of text queries about the video into the summarization process. We propose a memory network parameterized sequential determinantal point process in order to attend the user query onto different video frames and shots. To address the second challenge, we contend that a good evaluation metric for video summarization should focus on the semantic information that humans can perceive rather than the visual features or temporal overlaps. To this end, we collect dense per-video-shot concept annotations, compile a new dataset, and suggest an efficient evaluation method defined upon the concept annotations. We conduct extensive experiments contrasting our video summarizer to existing ones and present detailed analyses about the dataset and the new evaluation method.

Explore More