Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yongjian Wu is active.

Publication


Featured researches published by Yongjian Wu.


computer vision and pattern recognition | 2017

Cross-Modality Binary Code Learning via Fusion Similarity Hashing

Hong Liu; Rongrong Ji; Yongjian Wu; Feiyue Huang; Baochang Zhang

Binary code learning has been emerging topic in large-scale cross-modality retrieval recently. It aims to map features from multiple modalities into a common Hamming space, where the cross-modality similarity can be approximated efficiently via Hamming distance. To this end, most existing works learn binary codes directly from data instances in multiple modalities, which preserve both intra-and inter-modal similarities respectively. Few methods consider to preserve the fusion similarity among multi-modal instances instead, which can explicitly capture their heterogeneous correlation in cross-modality retrieval. In this paper, we propose a hashing scheme, termed Fusion Similarity Hashing (FSH), which explicitly embeds the graph-based fusion similarity across modalities into a common Hamming space. Inspired by the fusion by diffusion, our core idea is to construct an undirected asymmetric graph to model the fusion similarity among different modalities, upon which a graph hashing scheme with alternating optimization is introduced to learn binary codes that embeds such fusion similarity. Quantitative evaluations on three widely used benchmarks, i.e., UCI Handwritten Digit, MIR-Flickr25K and NUS-WIDE, demonstrate that the proposed FSH approach can achieve superior performance over the state-of-the-art methods.


acm multimedia | 2017

More Than An Answer: Neural Pivot Network for Visual Qestion Answering

Yiyi Zhou; Rongrong Ji; Jinsong Su; Yongjian Wu; Yunsheng Wu

Most of existing works in visual question answering (VQA) are dedicated to improving the performance of answer predictions, while leaving the explanation of answering unexploited. We argue that, exploiting the explanations of question answering not only makes VQA explainable, but also quantitatively improves the prediction performance. In this paper, we propose a novel network architecture, termed Neural Pivot Network (NPN), towards simultaneous VQA and generating explanations in a multi-task learning architecture. NPN is trained by using both image-caption and image-question-answer pairs. In principle, CNN-based deep visual features are extracted and sent to both the VQA channel and the captioning module, the latter of which serves as a pivot to bridge the source image module to the target QA predictor. Such an innovative design enables us to introduce large-scale image-captioning training sets, e.g., MS-COCO Caption and Visual Genome Caption, together with cutting-edge image captioning models to benefit VQA learning. Quantitatively, the proposed NPN performs significantly better than alternatives and state-of-the-art schemes trained on VQA datasets only. Besides, by investigating the by-product of experiments, in-depth digests can be provided along with the answers.


acm multimedia | 2017

StructCap: Structured Semantic Embedding for Image Captioning

Fuhai Chen; Rongrong Ji; Jinsong Su; Yongjian Wu; Yunsheng Wu

Image captioning has attracted ever-increasing research attention in multimedia and computer vision. To encode the visual content, existing approaches typically utilize the off-the-shelf deep Convolutional Neural Network (CNN) model to extract visual features, which are sent to Recurrent Neural Network (RNN) based textual generators to output word sequence. Some methods encode visual objects and scene information with attention mechanism more recently. Despite the promising progress, one distinct disadvantage lies in distinguishing and modeling key semantic entities and their relations, which are in turn widely regarded as the important cues for us to describe image content. In this paper, we propose a novel image captioning model, termed StructCap. It parses a given image into key entities and their relations organized in a visual parsing tree, which is transformed and embedded under an encoder-decoder framework via visual attention. We give an end-to-end formulation to facilitate joint training of visual tree parser, structured semantic attention and RNN-based captioning modules. Experimental results on two public benchmarks, Microsoft COCO and Flickr30K, show that the proposed StructCap model outperforms the state-of-the-art approaches under various standard evaluation metrics.


international joint conference on artificial intelligence | 2018

Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval

Xiawu Zheng; Rongrong Ji; Xiaoshuai Sun; Yongjian Wu; Feiyue Huang; Yanhua Yang

Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemes are typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fine-grained object retrieval scheme that conquers these issues in a unified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very efficient (1,000 times training speedup comparing to the triplet loss) and discriminative feature learning by a “centralized” global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features “within” the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance of the proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We have reported significant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017] on CARS196, and 3.7% on CUB200-2011.


international joint conference on artificial intelligence | 2018

Accelerating Convolutional Networks via Global & Dynamic Filter Pruning

Shaohui Lin; Rongrong Ji; Yuchao Li; Yongjian Wu; Feiyue Huang; Baochang Zhang

Accelerating convolutional neural networks has recently received ever-increasing research focus. Among various approaches proposed in the literature, filter pruning has been regarded as a promising solution, which is due to its advantage in significant speedup and memory reduction of both network model and intermediate feature maps. To this end, most approaches tend to prune filters in a layerwise fixed manner, which is incapable to dynamically recover the previously removed filter, as well as jointly optimize the pruned network across layers. In this paper, we propose a novel global & dynamic pruning (GDP) scheme to prune redundant filters for CNN acceleration. In particular, GDP first globally prunes the unsalient filters across all layers by proposing a global discriminative function based on prior knowledge of each filter. Second, it dynamically updates the filter saliency all over the pruned sparse network, and then recovers the mistakenly pruned filter, followed by a retraining phase to improve the model accuracy. Specially, we effectively solve the corresponding nonconvex optimization problem of the proposed GDP via stochastic gradient descent with greedy alternative updating. Extensive experiments show that the proposed approach achieves superior performance to accelerate several cutting-edge CNNs on the ILSVRC 2012 benchmark, comparing to the state-of-the-art filter pruning methods.


acm multimedia | 2018

Supervised Online Hashing via Hadamard Codebook Learning

Mingbao Lin; Rongrong Ji; Hong Liu; Yongjian Wu

In recent years, binary code learning, a.k.a. hashing, has received extensive attention in large-scale multimedia retrieval. It aims to encode high-dimensional data points into binary codes, hence the original high-dimensional metric space can be efficiently approximated via Hamming space. However, most existing hashing methods adopted offline batch learning, which is not suitable to handle incremental datasets with streaming data or new instances. In contrast, the robustness of the existing online hashing remains as an open problem, while the embedding of supervised/semantic information hardly boosts the performance of the online hashing, mainly due to the defect of unknown category numbers in supervised learning. In this paper, we propose an online hashing scheme, termed Hadamard Codebook based Online Hashing (HCOH), which aims to solving the above problems towards robust and supervised online hashing. In particular, we first assign an appropriate high-dimensional binary codes to each class label, which is generated randomly by Hadamard codes. Subsequently, LSH is adopted to reduce the length of such Hadamard codes in accordance with the hash bits, which can adapt the predefined binary codes online, and theoretically guarantee the semantic similarity. Finally, we consider the setting of stochastic data acquisition, which facilitates our method to efficiently learn the corresponding hashing functions via stochastic gradient descend (SGD) online. Notably, the proposed HCOH can be embedded with supervised labels and is not limited to a predefined category number. Extensive experiments on three widely-used benchmarks demonstrate the merits of the proposed scheme over the state-of-the-art methods.


acm multimedia | 2018

Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval

Hong Liu; Mingbao Lin; Shengchuan Zhang; Yongjian Wu; Feiyue Huang; Rongrong Ji

Cross-modality retrieval has been widely studied, which aims to search images as response to text queries or vice versa. When faced with large-scale dataset, cross-modality hashing serves as an efficient and effective solution, which learns binary codes to approximate the cross-modality similarity in the Hamming space. Most recent cross-modality hashing schemes focus on learning the hash functions from data instances with fully modalities. However, how to learn robust binary codes when facing incomplete modality (i.e., with one modality missed or partially observed), is left unexploited, which however widely occurs in real-world applications. In this paper, we propose a novel cross-modality hashing, termed Dense Auto-encoder Hashing (DAH), which can explicitly impute the missed modality and produce robust binary codes by leveraging the relatedness among different modalities. To that effect, we propose a novel Dense Auto-encoder Network (DAN) to impute the missing modalities, which densely connects each layer to every other layer in a feed-forward fashion. For each layer, a noisy auto-encoder block is designed to calculate the residue between the current prediction and original data. Finally, a hash-layer is added to the end of DAN, which serves as a special binary encoder model to deal with the incomplete modality input. Quantitative experiments on three cross-modality visual search benchmarks, i.e., the Wiki, NUS-WIDE, and FLICKR-25K, have shown that the proposed DAH has superior performance over the state-of-the-art approaches.


IEEE Transactions on Image Processing | 2017

Toward Optimal Manifold Hashing via Discrete Locally Linear Embedding

Rongrong Ji; Hong Liu; Liujuan Cao; Di Liu; Yongjian Wu; Feiyue Huang

Binary code learning, also known as hashing, has received increasing attention in large-scale visual search. By transforming high-dimensional features to binary codes, the original Euclidean distance is approximated via Hamming distance. More recently, it is advocated that it is the manifold distance, rather than the Euclidean distance, that should be preserved in the Hamming space. However, it retains as an open problem to directly preserve the manifold structure by hashing. In particular, it first needs to build the local linear embedding in the original feature space, and then quantize such embedding to binary codes. Such a two-step coding is problematic and less optimized. Besides, the off-line learning is extremely time and memory consuming, which needs to calculate the similarity matrix of the original data. In this paper, we propose a novel hashing algorithm, termed discrete locality linear embedding hashing (DLLH), which well addresses the above challenges. The DLLH directly reconstructs the manifold structure in the Hamming space, which learns optimal hash codes to maintain the local linear relationship of data points. To learn discrete locally linear embeddingcodes, we further propose a discrete optimization algorithm with an iterative parameters updating scheme. Moreover, an anchor-based acceleration scheme, termed Anchor-DLLH, is further introduced, which approximates the large similarity matrix by the product of two low-rank matrices. Experimental results on three widely used benchmark data sets, i.e., CIFAR10, NUS-WIDE, and YouTube Face, have shown superior performance of the proposed DLLH over the state-of-the-art approaches.Binary code learning, also known as hashing, has received increasing attention in large-scale visual search. By transforming high-dimensional features to binary codes, the original Euclidean distance is approximated via Hamming distance. More recently, it is advocated that it is the manifold distance, rather than the Euclidean distance, that should be preserved in the Hamming space. However, it retains as an open problem to directly preserve the manifold structure by hashing. In particular, it first needs to build the local linear embedding in the original feature space, and then quantize such embedding to binary codes. Such a two-step coding is problematic and less optimized. Besides, the off-line learning is extremely time and memory consuming, which needs to calculate the similarity matrix of the original data. In this paper, we propose a novel hashing algorithm, termed discrete locality linear embedding hashing (DLLH), which well addresses the above challenges. The DLLH directly reconstructs the manifold structure in the Hamming space, which learns optimal hash codes to maintain the local linear relationship of data points. To learn discrete locally linear embeddingcodes, we further propose a discrete optimization algorithm with an iterative parameters updating scheme. Moreover, an anchor-based acceleration scheme, termed Anchor-DLLH, is further introduced, which approximates the large similarity matrix by the product of two low-rank matrices. Experimental results on three widely used benchmark data sets, i.e., CIFAR10, NUS-WIDE, and YouTube Face, have shown superior performance of the proposed DLLH over the state-of-the-art approaches.


international joint conference on artificial intelligence | 2016

Supervised matrix factorization for cross-modality hashing

Hong Liu; Rongrong Ji; Yongjian Wu; Gang Hua


national conference on artificial intelligence | 2016

Towards optimal binary code learning via ordinal embedding

Hong Liu; Rongrong Ji; Yongjian Wu; Wei Liu

Collaboration


Dive into the Yongjian Wu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaoshuai Sun

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge