Qifan Wang
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qifan Wang.
european conference on computer vision | 2014
Qifan Wang; Bin Shen; Shumiao Wang; Liang Li; Luo Si
Tags have been popularly utilized for better annotating, organizing and searching for desirable images. Image tagging is the problem of automatically assigning tags to images. One major challenge for image tagging is that the existing/training labels associated with image examples might be incomplete and noisy. Valuable prior work has focused on improving the accuracy of the assigned tags, but very limited work tackles the efficiency issue in image tagging, which is a critical problem in many large scale real world applications. This paper proposes a novel Binary Codes Embedding approach for Fast Image Tagging (BCE-FIT) with incomplete labels. In particular, we construct compact binary codes for both image examples and tags such that the observed tags are consistent with the constructed binary codes. We then formulate the problem of learning binary codes as a discrete optimization problem. An efficient iterative method is developed to solve the relaxation problem, followed by a novel binarization method based on orthogonal transformation to obtain the binary codes from the relaxed solution. Experimental results on two large scale datasets demonstrate that the proposed approach can achieve similar accuracy with state-of-the-art methods while using much less time, which is important for large scale applications.
international acm sigir conference on research and development in information retrieval | 2013
Qifan Wang; Dan Zhang; Luo Si
It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods are: (1) Tag information is often associated with documents in many real world applications, but has not been fully exploited yet; (2) The similarity in keyword feature space does not fully reflect semantic relationships that go beyond keyword matching. This paper proposes a novel hashing approach, Semantic Hashing using Tags and Topic Modeling (SHTTM), to incorporate both the tag information and the similarity information from probabilistic topic modeling. In particular, a unified framework is designed for ensuring hashing codes to be consistent with tag information by a formal latent factor model and preserving the document topic/semantic similarity that goes beyond keyword matching. An iterative coordinate descent procedure is proposed for learning the optimal hashing codes. An extensive set of empirical studies on four different datasets has been conducted to demonstrate the advantages of the proposed SHTTM approach against several other state-of-the-art semantic hashing techniques. Furthermore, experimental results indicate that the modeling of tag information and utilizing topic modeling are beneficial for improving the effectiveness of hashing separately, while the combination of these two techniques in the unified framework obtains even better results.
european conference on computer vision | 2014
Qifan Wang; Luo Si; Dan Zhang
Similarity search is an important technique in many large scale vision applications. Hashing approach becomes popular for similarity search due to its computational and memory efficiency. Recently, it has been shown that the hashing quality could be improved by combining supervised information, e.g. semantic tags/labels, into hashing function learning. However, tag information is not fully exploited in existing unsupervised and supervised hashing methods especially when only partial tags are available. This paper proposes a novel semi-supervised tag hashing (SSTH) approach that fully incorporates tag information into learning effective hashing function by exploring the correlation between tags and hashing bits. The hashing function is learned in a unified learning framework by simultaneously ensuring the tag consistency and preserving the similarities between image examples. An iterative coordinate descent algorithm is designed as the optimization procedure. Furthermore, we improve the effectiveness of hashing function through orthogonal transformation by minimizing the quantization error. Extensive experiments on two large scale image datasets demonstrate the superior performance of the proposed approach over several state-of-the-art hashing methods.
international acm sigir conference on research and development in information retrieval | 2014
Ying Zhang; Ning Zhang; Luo Si; Yanshan Lu; Qifan Wang; Xiaojie Yuan
In many online news services, users often write comments towards news in subjective emotions such as sadness, happiness or anger. Knowing such emotions can help understand the preferences and perspectives of individual users, and therefore may facilitate online publishers to provide more relevant services to users. Although building emotion classifiers is a practical task, it highly depends on sufficient training data that is not easy to be collected directly and the manually labeling work of comments can be quite labor intensive. Also, online news has different domains, which makes the problem even harder as different word distributions of the domains require different classifiers with corresponding distinct training data. This paper addresses the task of emotion tagging for comments of cross-domain online news. The cross-domain task is formulated as a transfer learning problem which utilizes a small amount of labeled data from a target news domain and abundant labeled data from a different source domain. This paper proposes a novel framework to transfer knowledge across different news domains. More specifically, different approaches have been proposed when the two domains share the same set of emotion categories or use different categories. An extensive set of experimental results on four datasets from popular online news services demonstrates the effectiveness of our proposed models in cross-domain emotion tagging for comments of online news in both the scenarios of sharing the same emotion categories or having different categories in the source and target domains.
conference on information and knowledge management | 2013
Qifan Wang; Dan Zhang; Luo Si
Similarity search, or finding approximate nearest neighbors, is an important technique for many applications. Many recent research demonstrate that hashing methods can achieve promising results for large scale similarity search due to its computational and memory efficiency. However, most existing hashing methods treat all hashing bits equally and the distance between data examples is calculated as the Hamming distance between their hashing codes, while different hashing bits may carry different amount of information. This paper proposes a novel method, named Weighted Hashing (WeiHash), to assign different weights to different hashing bits. The hashing codes and their corresponding weights are jointly learned in a unified framework by simultaneously preserving the similarity between data examples and balancing the variance of each hashing bit. An iterative coordinate descent optimization algorithm is designed to derive desired hashing codes and weights. Extensive experiments on two large scale datasets demonstrate the superior performance of the proposed research over several state-of-the-art hashing methods.
european conference on computer vision | 2012
Qifan Wang; Luo Si; Dan Zhang
Multiple Instance Learning (MIL) has been widely used in various applications including image classification. However, existing MIL methods do not explicitly address the multi-target problem where the distributions of positive instances are likely to be multi-modal. This strongly limits the performance of multiple instance learning in many real world applications. To address this problem, this paper proposes a novel discriminative data-dependent mixture-model method for multiple instance learning (MM-MIL) approach in image classification. The new method explicitly handles the multi-target problem by introducing a data-dependent mixture model, which allows positive instances to come from different clusters in a flexible manner. Furthermore, the kernelized representation of the proposed model allows effective and efficient learning in high dimensional feature space. An extensive set of experimental results demonstrate that the proposed new MM-MIL approach substantially outperforms several state-of-art MIL algorithms on benchmark datasets.
conference on information and knowledge management | 2013
Qifan Wang; Lingyun Ruan; Zhiwei Zhang; Luo Si
Tags have been popularly utilized in many applications with image and text data for better managing, organizing and searching for useful information. Tag completion provides missing tag information for a set of existing images or text documents while tag prediction recommends tag information for any new image or text document. Valuable prior research has focused on improving the accuracy of tag completion and prediction, but limited research has been conducted for the efficiency issue in tag completion and prediction, which is a critical problem in many large scale real world applications. This paper proposes a novel efficient Hashing approach for Tag Completion and Prediction (HashTCP). In particular, we construct compact hashing codes for both data examples and tags such that the observed tags are consistent with the constructed hashing codes and the similarities between data examples are also preserved. We then formulate the problem of learning binary hashing codes as a discrete optimization problem. An efficient coordinate descent method is developed as the optimization procedure for the relaxation problem. A novel binarization method based on orthogonal transformation is proposed to obtain the binary codes from the relaxed solution. Experimental results on four datasets demonstrate that the proposed approach can achieve similar or even better accuracy with state-of-the-art methods and can be much more efficient, which is important for large scale applications.
Multimedia Tools and Applications | 2016
Bin Shen; Bao-Di Liu; Qifan Wang
Dictionary learning plays a key role in image representation for classification. A multi-modal dictionary is usually learned from feature samples across different classes and shared in the feature encoding process. Ideally each atom in dictionary corresponds to a single class of images, while each class of images corresponds to a certain group of atoms. Image features are encoded as linear combinations of selected atoms in a given dictionary. We propose to use elastic net as regularizer to select atoms in feature coding and related dictionary learning process, which not only benefits from the sparsity similar as ℓ1 penalty but also encourages a grouping effect that helps improve image representation. Experimental results of image classification on benchmark datasets show that with dictionary learned in the proposed way outperforms state-of-the-art dictionary learning algorithms.
international acm sigir conference on research and development in information retrieval | 2014
Qifan Wang; Luo Si; Zhiwei Zhang; Ning Zhang
Similarity search is an important problem in many large scale applications such as image and text retrieval. Hashing method has become popular for similarity search due to its fast search speed and low storage cost. Recent research has shown that hashing quality can be dramatically improved by incorporating supervised information, e.g. semantic tags/labels, into hashing function learning. However, most existing supervised hashing methods can be regarded as passive methods, which assume that the labeled data are provided in advance. But in many real world applications, such supervised information may not be available. This paper proposes a novel active hashing approach, Active Hashing with Joint Data Example and Tag Selection (AH-JDETS), which actively selects the most informative data examples and tags in a joint manner for hashing function learning. In particular, it first identifies a set of informative data examples and tags for users to label based on the selection criteria that both the data examples and tags should be most uncertain and dissimilar with each other. Then this labeled information is combined with the unlabeled data to generate an effective hashing function. An iterative procedure is proposed for learning the optimal hashing function and selecting the most informative data examples and tags. Extensive experiments on four different datasets demonstrate that AH-JDETS achieves good performance compared with state-of-the-art supervised hashing methods but requires much less labeling cost, which overcomes the limitation of passive hashing methods. Furthermore, experimental results also indicate that the joint active selection approach outperforms a random (non-active) selection method and active selection methods only focusing on either data examples or tags.
international conference on image processing | 2014
Bin Shen; Bao-Di Liu; Qifan Wang; Rongrong Ji
Nonnegative Matrix Factorization (NMF) is a widely used technique in many applications such as face recognition, motion segmentation, etc. It approximates the nonnegative data in an original high dimensional space with a linear representation in a low dimensional space by using the product of two nonnegative matrices. In many applications data are often partially corrupted with large additive noise. When the positions of noise are known, some existing variants of N-MF can be applied by treating these corrupted entries as missing values. However, the positions are often unknown in many real world applications, which prevents the usage of traditional NMF or other existing variants of NMF. This paper proposes a Robust Nonnegative Matrix Factorization (RobustNMF) algorithm that explicitly models the partial corruption as large additive noise without requiring the information of positions of noise. In particular, the proposed method jointly approximates the clean data matrix with the product of two nonnegative matrices and estimates the positions and values of outliers/noise. An efficient iterative optimization algorithm with a solid theoretical justification has been proposed to learn the desired matrix factorization. Experimental results demonstrate the advantages of the proposed algorithm.