Siliang Tang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Siliang Tang is active.

Explore More

Publication

Featured researches published by Siliang Tang.

IEEE Transactions on Multimedia | 2014

Sparse Multi-Modal Hashing

Fei Wu; Zhou Yu; Yi Yang; Siliang Tang; Yin Zhang; Yueting Zhuang

Learning hash functions across heterogenous high-dimensional features is very desirable for many applications involving multi-modal data objects. In this paper, we propose an approach to obtain the sparse codesets for the data objects across different modalities via joint multi-modal dictionary learning, which we call sparse multi-modal hashing (abbreviated as SM2H). In SM2H, both intra-modality similarity and inter-modality similarity are first modeled by a hypergraph, then multi-modal dictionaries are jointly learned by Hypergraph Laplacian sparse coding. Based on the learned dictionaries, the sparse codeset of each data object is acquired and conducted for multi-modal approximate nearest neighbor retrieval using a sensitive Jaccard metric. The experimental results show that SM2H outperforms other methods in terms of mAP and Percentage on two real-world data sets.

international acm sigir conference on research and development in information retrieval | 2013

A low rank structural large margin method for cross-modal ranking

Xinyan Lu; Fei Wu; Siliang Tang; Zhongfei Zhang; Xiaofei He; Yueting Zhuang

Cross-modal retrieval is a classic research topic in multimedia information retrieval. The traditional approaches study the problem as a pairwise similarity function problem. In this paper, we consider this problem from a new perspective as a listwise ranking problem and propose a general cross-modal ranking algorithm to optimize the listwise ranking loss with a low rank embedding, which we call Latent Semantic Cross-Modal Ranking (LSCMR). The latent low-rank embedding space is discriminatively learned by structural large margin learning to optimize for certain ranking criteria directly. We evaluate LSCMR on the Wikipedia and NUS-WIDE dataset. Experimental results show that this method obtains significant improvements over the state-of-the-art methods.

acm multimedia | 2014

Cross-Media Hashing with Neural Networks

Yueting Zhuang; Zhou Yu; Wei Wang; Fei Wu; Siliang Tang; Jian Shao

Cross-media hashing, which conducts cross-media retrieval by embedding data from different modalities into a common low-dimensional hamming space, has attracted intensive attention in recent years. This is motivated by the facts a) the multi-modal data is widespread, e.g., the web images on Flickr are associated with tags, and b) hashing is an effective technique towards large-scale high-dimensional data processing, which is exactly the situation of cross-media retrieval. Inspired by recent advances in deep learning, we propose a cross-media hashing approach based on multi-modal neural networks. By restricting in the learning objective a) the hash codes for relevant cross-media data being similar, and b) the hash codes being discriminative for predicting the class labels, the learned Hamming space is expected to well capture the cross-media semantic relationships and to be semantically discriminative. The experiments on two real-world data sets show that our approach achieves superior cross-media retrieval performance compared with the state-of-the-art methods.

IEEE Transactions on Image Processing | 2015

Cross-Modal Learning to Rank via Latent Joint Representation

Fei Wu; Xinyang Jiang; Xi Li; Siliang Tang; Weiming Lu; Zhongfei Zhang; Yueting Zhuang

Cross-modal ranking is a research topic that is imperative to many applications involving multimodal data. Discovering a joint representation for multimodal data and learning a ranking function are essential in order to boost the cross-media retrieval (i.e., image-query-text or text-query-image). In this paper, we propose an approach to discover the latent joint representation of pairs of multimodal data (e.g., pairs of an image query and a text document) via a conditional random field and structural learning in a listwise ranking manner. We call this approach cross-modal learning to rank via latent joint representation (CML2R). In CML2R, the correlations between multimodal data are captured in terms of their sharing hidden variables (e.g., topics), and a hidden-topic-driven discriminative ranking function is learned in a listwise ranking manner. The experiments show that the proposed approach achieves a good performance in cross-media retrieval and meanwhile has the capability to learn the discriminative representation of multimodal data.

acm multimedia | 2015

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment

Xinyang Jiang; Fei Wu; Xi Li; Zhou Zhao; Weiming Lu; Siliang Tang; Yueting Zhuang

Cross-modal retrieval is a very hot research topic that is imperative to many applications involving multi-modal data. Discovering an appropriate representation for multi-modal data and learning a ranking function are essential to boost the cross-media retrieval. Motivated by the assumption that a compositional cross-modal semantic representation (pairs of images and text) is more attractive for cross-modal ranking, this paper exploits the existing image-text databases to optimize a ranking function for cross-modal retrieval, called deep compositional cross-modal learning to rank (C2MLR). In this paper, C2MLR considers learning a multi-modal embedding from the perspective of optimizing a pairwise ranking problem while enhancing both local alignment and global alignment. In particular, the local alignment (i.e., the alignment of visual objects and textual words) and the global alignment (i.e., the image-level and sentence-level alignment) are collaboratively utilized to learn the multi-modal embedding common space in a max-margin learning to rank manner. The experiments demonstrate the superiority of our proposed C2MLR due to its nature of multi-modal compositional embedding.

China Communications | 2013

The discovery of burst topic and its intermittent evolution in our real world

Siliang Tang; Yin Zhang; Hanqi Wang; Ming Chen; Fei Wu; Yueting Zhuang

Nowadays, a considerably large number of documents are available over many online news sites (e.g., CNN and NYT). Therefore, the utilization of these online documents, for example, the discovery of a burst topic and its evolution, is a significant challenge. In this paper, a novel topic model, called intermittent Evolution LDA (iELDA) is proposed. In iELDA, the time-evolving documents are divided into many small epochs. iELDA utilizes the detected global topics as priors to guide the detection of an emerging topic and keep track of its evolution over different epochs. As a natural extension of the traditional Latent Dirichlet Allocation (LDA) and Dynamic Topic Model (DTM), iELDA has an advantage: it can discover the intermittent recurring pattern of a burst topic. We apply iELDA to real-world data from NYT; the results demonstrate that the proposed iELDA can appropriately capture a burst topic and track its intermittent evolution as well as produce a better predictive ability than other related topic models.

sino foreign interchange conference on intelligent science and intelligent data engineering | 2012

Logistic tensor regression for classification

Xu Tan; Yin Zhang; Siliang Tang; Jian Shao; Fei Wu; Yueting Zhuang

Logistic regression is one of the classical approaches for classification which has been widely used in computer vision, bioinformatics as well as multimedia understanding. However, when it is applied to high-dimensional data with structural information such as facial images or motion data, traditional vector-based logistic regression suffers from two main weaknesses: one is its negligence of structural information, and the other is its trend of overfitting. In this paper, we propose Logistic Tensor Regression (LTR) for classification of high-dimensional data with structural information. The proposed LTR not only reserves the underlying structural information embedded in data by tensorial representations, but also avoids overfitting by the introduction of a sparsity regularizer. Experiments on classification of facial images and motion data show that our proposed Logistic Tensor Regression approach outperforms the state-of-the-art algorithms.

acm multimedia | 2012

Supervised cross-collection topic modeling

Haidong Gao; Siliang Tang; Yin Zhang; Dapeng Jiang; Fei Wu; Yueting Zhuang

Nowadays, vast amounts of multimedia data can be obtained across different collections (or domains). Therefore, it poses significant challenges for the utilization of those cross-collection data, for examples, the summarization of similarities and differences of data across different domains (e.g., CNN and NYT), as well as finding visually similar images across different visual domains (e.g., photos, paintings and hand-drawn sketches). In this paper, a supervised cross-collection Latent Dirichlet Allocation (scLDA) approach is proposed to utilize the data across different collections. As a natural extension of traditional Latent Dirichlet Allocation (LDA), scLDA not only takes the structural priors of different collections into consideration, but also exploits the category information. The strength of this work lies in integrating topic modeling, cross-domain learning and supervised learning together. We conduct scLDA for comparative text mining as well as classification of news articles and images from different collections. The results suggest that our proposed scLDA can generate meaningful collection-specific topics and achieves better retrieval accuracy than other related topic models.

advances in multimedia | 2012

Image ranking via attribute boosted hypergraph

Zhou Yu; Siliang Tang; Yin Zhang; Jian Shao

Recently, the visual attribute of images is becoming a research focus in computer vision and multimedia retrieval areas due to its describable or human-nameable nature for image understanding. In this paper, the visual attribute is utilized to boost the result of image ranking. To well modeling the images along with their visual attributes, hypergraph is used to integrate the visual attributes with low-level features of images. After that, we perform a ranking algorithm on the hypergraph. The experiment conducted on Animal with Attribute(AwA) dataset demonstrate the effectiveness of our proposed approach.

IEEE Transactions on Multimedia | 2015

Structured Visual Feature Learning for Classification via Supervised Probabilistic Tensor Factorization

Xu Tan; Fei Wu; Xi Li; Siliang Tang; Weiming Lu; Yueting Zhuang

In this paper, structured visual feature learning aims at exploiting the intrinsic structural properties of mutually correlated multimedia collections (e.g., video frames or facial images) to learn a more effective feature representation for multimedia data classification. We pose structured visual feature learning as a problem of supervised tensor factorization (STF), which is capable of effectively learning multi-view visual features from structural tensorial multimedia data. In mathematics, STF is formulated as a joint optimization framework of probabilistic inference and ε-insensitive support vector regression. As a result, the feature representation obtained by STF not only preserves the intrinsic multi-view structural information on tensorial multimedia data, but also includes the discriminative information derived from the max-margin learning process. Using the learned discriminative visual features, we conduct a set of multimedia classification experiments on several challenging datasets, including images and videos, which demonstrate the effectiveness of our method.

Explore More