Yueting Zhuang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yueting Zhuang is active.

Explore More

Publication

Featured researches published by Yueting Zhuang.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback

Yi Yang; Feiping Nie; Dong Xu; Jiebo Luo; Yueting Zhuang; Yunhe Pan

We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.

IEEE Transactions on Multimedia | 2008

Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

Yi Yang; Yueting Zhuang; Fei Wu; Yunhe Pan

In this paper, we consider the problem of multimedia document (MMD) semantics understanding and content-based cross-media retrieval. An MMD is a set of media objects of different modalities but carrying the same semantics and the content-based cross-media retrieval is a new kind of retrieval method by which the query examples and search results can be of different modalities. Two levels of manifolds are learned to explore the relationships among all the data in the level of MMD and in the level of media object respectively. We first construct a Laplacian media object space for media object representation of each modality and an MMD semantic graph to learn the MMD semantic correlations. The characteristics of media objects propagate along the MMD semantic graph and an MMD semantic space is constructed to perform cross-media retrieval. Different methods are proposed to utilize relevance feedback and experiment shows that the proposed approaches are effective.

IEEE Transactions on Image Processing | 2010

Image Clustering Using Local Discriminant Models and Global Integration

Yi Yang; Dong Xu; Feiping Nie; Shuicheng Yan; Yueting Zhuang

In this paper, we propose a new image clustering algorithm, referred to as clustering using local discriminant models and global integration (LDMGI). To deal with the data points sampled from a nonlinear manifold, for each data point, we construct a local clique comprising this data point and its neighboring data points. Inspired by the Fisher criterion, we use a local discriminant model for each local clique to evaluate the clustering performance of samples within the local clique. To obtain the clustering result, we further propose a unified objective function to globally integrate the local models of all the local cliques. With the unified objective function, spectral relaxation and spectral rotation are used to obtain the binary cluster indicator matrix for all the samples. We show that LDMGI shares a similar objective function with the spectral clustering (SC) algorithms, e.g., normalized cut (NCut). In contrast to NCut in which the Laplacian matrix is directly calculated based upon a Gaussian function, a new Laplacian matrix is learnt in LDMGI by exploiting both manifold structure and local discriminant information. We also prove that K-means and discriminative K-means (DisKmeans) are both special cases of LDMGI. Extensive experiments on several benchmark image datasets demonstrate the effectiveness of LDMGI. We observe in the experiments that LDMGI is more robust to algorithmic parameter, when compared with NCut. Thus, LDMGI is more appealing for the real image clustering applications in which the ground truth is generally not available for tuning algorithmic parameters.

acm multimedia | 2009

Ranking with local regression and global alignment for cross media retrieval

Yi Yang; Dong Xu; Feiping Nie; Jiebo Luo; Yueting Zhuang

Rich multimedia content including images, audio and text are frequently used to describe the same semantics in E-Learning and Ebusiness web pages, instructive slides, multimedia cyclopedias, and so on. In this paper, we present a framework for cross-media retrieval, where the query example and the retrieved result(s) can be of different media types. We first construct Multimedia Correlation Space (MMCS) by exploring the semantic correlation of different multimedia modalities, during which multimedia content and co-occurrence information is utilized. We propose a novel ranking algorithm, namely ranking with Local Regression and Global Alignment (LRGA), which learns a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking values of its neighboring points. We propose a unified objective function to globally align the local models from all the data points so that an optimal ranking value can be assigned to each data point. LRGA is insensitive to parameters, making it particularly suitable for data ranking. A relevance feedback algorithm is proposed to improve the retrieval performance. Comprehensive experiments have demonstrated the effectiveness of our methods.

Computer Vision and Image Understanding | 2003

3D motion retrieval with motion index tree

Feng Liu; Yueting Zhuang; Fei Wu; Yunhe Pan

With the development of Motion capture techniques, more and more 3D motion libraries become available. In this paper, we present a novel content-based 3D motion retrieval algorithm. We partition the motion library and construct a motion index tree based on a hierarchical motion description. The motion index tree serves as a classifier to determine the sub-library that contains the promising similar motions to the query sample. The Nearest Neighbor rule-based dynamic clustering algorithm is adopted to partition the library and construct the motion index tree. The similarity between the sample and the motion in the sub-library is calculated through elastic match. To improve the efficiency of the similarity calculation, an adaptive clustering-based key-frame extraction algorithm is adopted. The experiment demonstrates the effectiveness of this algorithm.

IEEE Transactions on Multimedia | 2008

Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval

Yueting Zhuang; Yi Yang; Fei Wu

Although multimedia objects such as images, audios and texts are of different modalities, there are a great amount of semantic correlations among them. In this paper, we propose a method of transductive learning to mine the semantic correlations among media objects of different modalities so that to achieve the cross-media retrieval. Cross-media retrieval is a new kind of searching technology by which the query examples and the returned results can be of different modalities, e.g., to query images by an example of audio. First, according to the media objects features and their co-existence information, we construct a uniform cross-media correlation graph, in which media objects of different modalities are represented uniformly. To perform the cross-media retrieval, a positive score is assigned to the query example; the score spreads along the graph and media objects of target modality or MMDs with the highest scores are returned. To boost the retrieval performance, we also propose different approaches of long-term and short-term relevance feedback to mine the information contained in the positive and negative examples.

Pattern Recognition | 2007

Hallucinating faces: LPH super-resolution and neighbor reconstruction for residue compensation

Yueting Zhuang; Jian Zhang; Fei Wu

A two-phase face hallucination approach is proposed in this paper to infer high-resolution face image from the low-resolution observation based on a set of training image pairs. The proposed locality preserving hallucination (LPH) algorithm combines locality preserving projection (LPP) and radial basis function (RBF) regression together to hallucinate the global high-resolution face. Furthermore, in order to compensate the inferred global face with detailed inartificial facial features, the neighbor reconstruction based face residue hallucination is used. Compared with existing approaches, the proposed LPH algorithm can generate global face more similar to the ground truth face efficiently, moreover, the patch structure and search strategy carefully designed for the neighbor reconstruction algorithm greatly reduce the computational complexity without diminishing the quality of high-resolution face detail. The details of synthetic high-resolution face are further improved by a global linear smoother. Experiments indicate that our approach can synthesize distinct high-resolution faces with various facial appearances such as facial expressions, eyeglasses efficiently.

IEEE Transactions on Image Processing | 2016

DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection

Xi Li; Liming Zhao; Lina Wei; Ming-Hsuan Yang; Fei Wu; Yueting Zhuang; Haibin Ling; Jingdong Wang

A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner. In this paper, we propose a multi-task deep saliency model based on a fully convolutional neural network with global input (whole raw images) and global output (whole saliency maps). In principle, the proposed saliency model takes a data-driven strategy for encoding the underlying saliency prior information, and then sets up a multi-task learning scheme for exploring the intrinsic correlations between saliency detection and semantic image segmentation. Through collaborative feature learning from such two correlated tasks, the shared fully convolutional layers produce effective features for object perception. Moreover, it is capable of capturing the semantic information on salient objects across different levels using the fully convolutional layers, which investigate the feature-sharing properties of salient object detection with a great reduction of feature redundancy. Finally, we present a graph Laplacian regularized nonlinear regression model for saliency refinement. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

IEEE Transactions on Image Processing | 2012

Web and Personal Image Annotation by Mining Label Correlation With Relaxed Visual Graph Embedding

Yi Yang; Fei Wu; Feiping Nie; Heng Tao Shen; Yueting Zhuang; Alexander G. Hauptmann

The number of digital images rapidly increases, and it becomes an important challenge to organize these resources effectively. As a way to facilitate image categorization and retrieval, automatic image annotation has received much research attention. Considering that there are a great number of unlabeled images available, it is beneficial to develop an effective mechanism to leverage unlabeled images for large-scale image annotation. Meanwhile, a single image is usually associated with multiple labels, which are inherently correlated to each other. A straightforward method of image annotation is to decompose the problem into multiple independent single-label problems, but this ignores the underlying correlations among different labels. In this paper, we propose a new inductive algorithm for image annotation by integrating label correlation mining and visual similarity mining into a joint framework. We first construct a graph model according to image visual features. A multilabel classifier is then trained by simultaneously uncovering the shared structure common to different labels and the visual graph embedded label prediction matrix for image annotation. We show that the globally optimal solution of the proposed framework can be obtained by performing generalized eigen-decomposition. We apply the proposed framework to both web image annotation and personal album labeling using the NUS-WIDE, MSRA MM 2.0, and Kodak image data sets, and the AUC evaluation metric. Extensive experiments on large-scale image databases collected from the web and personal album show that the proposed algorithm is capable of utilizing both labeled and unlabeled data for image annotation and outperforms other algorithms.

computer vision and pattern recognition | 2016

Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning

Pingbo Pan; Zhongwen Xu; Yi Yang; Fei Wu; Yueting Zhuang

Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNets), have achieved overwhelming accuracy with fast processing speed for image classification. Incorporating temporal structure with deep ConvNets for video representation becomes a fundamental problem for video content analysis. In this paper, we propose a new approach, namely Hierarchical Recurrent Neural Encoder (HRNE), to exploit temporal information of videos. Compared to recent video representation inference approaches, this paper makes the following three contributions. First, our HRNE is able to efficiently exploit video temporal structure in a longer range by reducing the length of input information flow, and compositing multiple consecutive inputs at a higher level. Second, computation operations are significantly lessened while attaining more non-linearity. Third, HRNE is able to uncover temporal tran-sitions between frame chunks with different granularities, i.e. it can model the temporal transitions between frames as well as the transitions between segments. We apply the new method to video captioning where temporal information plays a crucial role. Experiments demonstrate that our method outperforms the state-of-the-art on video captioning benchmarks.

Explore More