Is this you? Create Your Porfile

Naiyan Wang

Hong Kong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Naiyan Wang is active.

Explore More

Publication

Featured researches published by Naiyan Wang.

knowledge discovery and data mining | 2015

Collaborative Deep Learning for Recommender Systems

Hao Wang; Naiyan Wang; Dit Yan Yeung

Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recently advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art.

international conference on computer vision | 2013

Online Robust Non-negative Dictionary Learning for Visual Tracking

Naiyan Wang; Jingdong Wang; Dit Yan Yeung

This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.

international conference on computer vision | 2015

Understanding and Diagnosing Visual Tracking Systems

Naiyan Wang; Jianping Shi; Dit Yan Yeung; Jiaya Jia

Several benchmark datasets for visual tracking research have been created in recent years. Despite their usefulness, whether they are sufficient for understanding and diagnosing the strengths and weaknesses of different trackers remains questionable. To address this issue, we propose a framework by breaking a tracker down into five constituent parts, namely, motion model, feature extractor, observation model, model updater, and ensemble post-processor. We then conduct ablative experiments on each component to study how it affects the overall result. Surprisingly, our findings are discrepant with some common beliefs in the visual tracking research community. We find that the feature extractor plays the most important role in a tracker. On the other hand, although the observation model is the focus of many studies, we find that it often brings no significant improvement. Moreover, the motion model and model updater contain many details that could affect the result. Also, the ensemble post-processor can improve the result substantially when the constituent trackers have high diversity. Based on our findings, we put together some very elementary building blocks to give a basic tracker which is competitive in performance to the state-of-the-art trackers. We believe our framework can provide a solid baseline when conducting controlled experiments for visual tracking research.

computer vision and pattern recognition | 2015

DevNet: A Deep Event Network for multimedia event detection and evidence recounting

Chuang Gan; Naiyan Wang; Yi Yang; Dit Yan Yeung; Alexander G. Hauptmann

In this paper, we focus on complex event detection in internet videos while also providing the key evidences of the detection results. Convolutional Neural Networks (CNNs) have achieved promising performance in image classification and action recognition tasks. However, it remains an open problem how to use CNNs for video event detection and recounting, mainly due to the complexity and diversity of video events. In this work, we propose a flexible deep CNN infrastructure, namely Deep Event Network (DevNet), that simultaneously detects pre-defined events and provides key spatial-temporal evidences. Taking key frames of videos as input, we first detect the event of interest at the video level by aggregating the CNN features of the key frames. The pieces of evidences which recount the detection results, are also automatically localized, both temporally and spatially. The challenge is that we only have video level labels, while the key evidences usually take place at the frame levels. Based on the intrinsic property of CNNs, we first generate a spatial-temporal saliency map by back passing through DevNet, which then can be used to find the key frames which are most indicative to the event, as well as to localize the specific spatial position, usually an object, in the frame of the highly indicative area. Experiments on the large scale TRECVID 2014 MEDTest dataset demonstrate the promising performance of our method, both for event detection and evidence recounting.

european conference on computer vision | 2012

A probabilistic approach to robust matrix factorization

Naiyan Wang; Tiansheng Yao; Jingdong Wang; Dit Yan Yeung

Matrix factorization underlies a large variety of computer vision applications. It is a particularly challenging problem for large-scale applications and when there exist outliers and missing data. In this paper, we propose a novel probabilistic model called Probabilistic Robust Matrix Factorization (PRMF) to solve this problem. In particular, PRMF is formulated with a Laplace error and a Gaussian prior which correspond to an l1 loss and an l2 regularizer, respectively. For model learning, we devise a parallelizable expectation-maximization (EM) algorithm which can potentially be applied to large-scale applications. We also propose an online extension of the algorithm for sequential data to offer further scalability. Experiments conducted on both synthetic data and some practical computer vision applications show that PRMF is comparable to other state-of-the-art robust matrix factorization methods in terms of accuracy and outperforms them particularly for large data matrices.

international conference on computer vision | 2013

Bayesian Robust Matrix Factorization for Image and Video Processing

Naiyan Wang; Dit Yan Yeung

Matrix factorization is a fundamental problem that is often encountered in many computer vision and machine learning tasks. In recent years, enhancing the robustness of matrix factorization methods has attracted much attention in the research community. To benefit from the strengths of full Bayesian treatment over point estimation, we propose here a full Bayesian approach to robust matrix factorization. For the generative process, the model parameters have conjugate priors and the likelihood (or noise model) takes the form of a Laplace mixture. For Bayesian inference, we devise an efficient sampling algorithm by exploiting a hierarchical view of the Laplace distribution. Besides the basic model, we also propose an extension which assumes that the outliers exhibit spatial or temporal proximity as encountered in many computer vision applications. The proposed methods give competitive experimental results when compared with several state-of-the-art methods on some benchmark image and video processing tasks.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Trinary-Projection Trees for Approximate Nearest Neighbor Search

Jingdong Wang; Naiyan Wang; You Jia; Jian Li; Gang Zeng; Hongbin Zha; Xian-Sheng Hua

We address the problem of approximate nearest neighbor (ANN) search for visual descriptor indexing. Most spatial partition trees, such as KD trees, VP trees, and so on, follow the hierarchical binary space partitioning framework. The key effort is to design different partition functions (hyperplane or hypersphere) to divide the points so that 1) the data points can be well grouped to support effective NN candidate location and 2) the partition functions can be quickly evaluated to support efficient NN candidate location. We design a trinary-projection direction-based partition function. The trinary-projection direction is defined as a combination of a few coordinate axes with the weights being 1 or -1. We pursue the projection direction using the widely adopted maximum variance criterion to guarantee good space partitioning and find fewer coordinate axes to guarantee efficient partition function evaluation. We present a coordinate-wise enumeration algorithm to find the principal trinary-projection direction. In addition, we provide an extension using multiple randomized trees for improved performance. We justify our approach on large-scale local patch indexing and similar image search.

computer vision and pattern recognition | 2015

Bayesian adaptive matrix factorization with automatic model selection

Peixian Chen; Naiyan Wang; Nevin Lianwen Zhang; Dit Yan Yeung

Low-rank matrix factorization has long been recognized as a fundamental problem in many computer vision applications. Nevertheless, the reliability of existing matrix factorization methods is often hard to guarantee due to challenges brought by such model selection issues as selecting the noise model and determining the model capacity. We address these two issues simultaneously in this paper by proposing a robust non-parametric Bayesian adaptive matrix factorization (AMF) model. AMF proposes a new noise model built on the Dirichlet process Gaussian mixture model (DP-GMM) by taking advantage of its high flexibility on component number selection and capability of fitting a wide range of unknown noise. AMF also imposes an automatic relevance determination (ARD) prior on the low-rank factor matrices so that the rank can be determined automatically without the need for enforcing any hard constraint. An efficient variational method is then devised for model inference. We compare AMF with state-of-the-art matrix factorization methods based on data sets ranging from synthetic data to real-world application data. From the results, AMF consistently achieves better or comparable performance.

neural information processing systems | 2013