Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ruifan Li is active.

Publication


Featured researches published by Ruifan Li.


acm multimedia | 2014

Cross-modal Retrieval with Correspondence Autoencoder

Fangxiang Feng; Xiaojie Wang; Ruifan Li

The problem of cross-modal retrieval, e.g., using a text query to search for images and vice-versa, is considered in this paper. A novel model involving correspondence autoencoder (Corr-AE) is proposed here for solving this problem. The model is constructed by correlating hidden representations of two uni-modal autoencoders. A novel optimal objective, which minimizes a linear combination of representation learning errors for each modality and correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimization of correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimization of representation learning error makes hidden representations are good enough to reconstruct input of each modality. A parameter


Neurocomputing | 2015

Deep correspondence restricted Boltzmann machine for cross-modal retrieval

Fangxiang Feng; Ruifan Li; Xiaojie Wang

\alpha


China Communications | 2017

Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder

Ibrar Ahmad; Xiaojie Wang; Ruifan Li; Shahid Rasheed

is used to balance the representation learning error and the correlation learning error. Based on two different multi-modal autoencoders, Corr-AE is extended to other two correspondence models, here we called Corr-Cross-AE and Corr-Full-AE. The proposed models are evaluated on three publicly available data sets from real scenes. We demonstrate that the three correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multi-modal deep models on cross-modal retrieval tasks.


acm multimedia | 2015

Correspondence Autoencoders for Cross-Modal Retrieval

Fangxiang Feng; Xiaojie Wang; Ruifan Li; Ibrar Ahmad

The task of cross-modal retrieval, i.e., using a text query to search for images or vice versa, has received considerable attention with the rapid growth of multi-modal web data. Modeling the correlations between different modalities is the key to tackle this problem. In this paper, we propose a correspondence restricted Boltzmann machine (Corr-RBM) to map the original features of bimodal data, such as image and text in our setting, into a low-dimensional common space, in which the heterogeneous data are comparable. In our Corr-RBM, two RBMs built for image and text, respectively are connected at their individual hidden representation layers by a correlation loss function. A single objective function is constructed to trade off the correlation loss and likelihoods of both modalities. Through the optimization of this objective function, our Corr-RBM is able to capture the correlations between two modalities and learn the representation of each modality simultaneously. Furthermore, we construct two deep neural structures using Corr-RBM as the main building block for the task of cross-modal retrieval. A number of comparison experiments are performed on three public real-world data sets. All of our models show significantly better results than state-of-the-art models in both searching images via text query and vice versa.


IEEE Access | 2017

Line and Ligature Segmentation of Urdu Nastaleeq Text

Ibrar Ahmad; Xiaojie Wang; Ruifan Li; Manzoor Ahmed; Rahat Ullah

Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded (noise free) UPTI (Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures.


Cluster Computing | 2017

Retrieving real world clothing images via multi-weight deep convolutional neural networks

Ruifan Li; Fangxiang Feng; Ibrar Ahmad; Xiaojie Wang

This article considers the problem of cross-modal retrieval, such as using a text query to search for images and vice-versa. Based on different autoencoders, several novel models are proposed here for solving this problem. These models are constructed by correlating hidden representations of a pair of autoencoders. A novel optimal objective, which minimizes a linear combination of the representation learning errors for each modality and the correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimizing the correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimizing the representation learning error makes hidden representations good enough to reconstruct inputs of each modality. To balance the two kind of errors induced by representation learning and correlation learning, we set a specific parameter in our models. Furthermore, according to the modalities the models attempt to reconstruct they are divided into two groups. One group including three models is named multimodal reconstruction correspondence autoencoder since it reconstructs both modalities. The other group including two models is named unimodal reconstruction correspondence autoencoder since it reconstructs a single modality. The proposed models are evaluated on three publicly available datasets. And our experiments demonstrate that our proposed correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multimodal deep models on cross-modal retrieval tasks.


Signal Processing | 2016

An EL-LDA based general color harmony model for photo aesthetics assessment

Peng Lu; Xujun Peng; Xinshan Zhu; Ruifan Li

The recognition accuracy of ligature-based Urdu language optical character recognition (OCR) systems highly depends on the accuracy of segmentation that converts Urdu text into lines and ligatures. In general, lines and ligatures-based Urdu language OCRs are more successful as compared to characters-based. This paper presents the techniques for segmenting Urdu Nastaleeq text images into lines and subsequently to ligatures. Classical horizontal projection-based segmentation method is augmented with a curved-line-split algorithm for successfully overcoming the problems, such as text line split position, overlapping, merged ligatures, and ligatures crossing line split positions. Ligature segmentation algorithm extracts connected components from text lines, categorizes them into primary and secondary classes, and allocates secondary components to the primary class by examining width, height, coordinates, overlapping, centroids, and baseline information. The proposed line segmentation algorithm is tested on 47 pages with 99.17% accuracy. The proposed ligature segmentation algorithm is mainly tested on a large Urdu-printed text images data set. The proposed algorithm segmented Urdu-printed text images data set to 189 000 ligatures from 10 063 text lines having 332 000 connected components. A total of about 142 000 secondary components have been successfully allocated to more than 189 000 primary ligatures with accuracy rate of 99.80%. Thus, both of the proposed segmentation algorithms outperform the existing algorithms employed for Urdu Nastaleeq text segmentation. Moreover, the proposed line segmentation algorithm is also tested on Arabic, for which it also extracted lines correctly.


international joint conference on artificial intelligence | 2018

Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning

Yuzhao Mao; Chang Zhou; Xiaojie Wang; Ruifan Li

Clothing images are abundantly available from the Internet, especially from the e-commercial platform. Retrieving those images is of importance for commercial and social applications and has recently been received tremendous attention from communities, such as multimedia processing and computer vision. However, the large variations in clothing of their appearance and style, and even the large quantity of multiple categories and attributes make those problems challenging. Furthermore, for real world images their labels provided by shop retailers from webpages are largely erroneous or incomplete. And the imbalance among those image categories prevents the effective learning. To overcome those problems, in this paper, we adopt a multi-task deep learning framework to learn the representation. And we propose multi-weight deep convolutional neural networks for imbalance learning. The topology of this network contains two groups of layers, shared layers at the bottom and task dependent ones at the top. Furthermore, category-relevant parameters are incorporated to regularize the backward gradients for categories. Mathematical proof shows its relationship to regulating the learning rates. Experiments demonstrate that our proposed joint framework and multi-weight neural networks can effectively learn robust representations and achieve better performance.


Mathematical Problems in Engineering | 2015

Obtaining Cross Modal Similarity Metric with Deep Neural Architecture

Ruifan Li; Fangxiang Feng; Xiaojie Wang; Peng Lu; Bohan Li

The goal of photo aesthetics assessment is to build a computational model which can estimate the aesthetics quality of digital images with respect to humans perception. As one of the most important features that determine the degree of images aesthetics quality, color harmony has gained increasing attentions. To overcome the problems of most classical color harmony models, which are heavily relied on heuristic rules and ignore the semantic information of images, we propose a statistical learning framework in this paper to train a color harmony model from a large number of natural images. In this framework, the semantic label information, which indicates the content of each image, along with the visual features is used to facilitate the latent Dirichlet allocation (LDA) training. Then, the degree of color harmony can be estimated by using supervised/unsupervised models, which is applied to indicate the photos aesthetics score. By using the proposed color harmony model, we attempt to uncover the underlying principles that generate pleasing color combinations based on natural images. The experimental results show that the proposed approach outperforms the conventional heuristic color harmony models for image aesthetics assessment. HighlightsA novel framework is proposed to learn color harmony for photo aesthetics assessment.An extended labeled-LDA model is presented to learn the complex color combinations.A thorough analysis of different color spaces for color harmony learning is given.The proposed method outperforms the existing color harmony models.


arXiv: Learning | 2013

Constructing Hierarchical Image-tags Bimodal Representations for Word Tags Alternative Choice.

Fangxiang Feng; Ruifan Li; Xiaojie Wang

Image captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words. Singlesentence can hardly give a complete view of an image even by humans. In this paper, we propose a novel Topic-Oriented Multi-Sentence (TOMS) captioning model, which can generate multiple topicoriented sentences to describe an image. Different from object instances or visual attributes, topics mined by the latent Dirichlet allocation reflect hidden thematic structures in reference sentences of an image. In our model, each topic is integrated to a caption generator with a Fusion Gate Unit (FGU) to guide the generation of a sentence towards a certain topic perspective. With multiple sentences from different topics, our TOMS provides a complete description of an image. Experimental results on both sentence and paragraph datasets demonstrate the effectiveness of our TOMS in terms of topical consistency and descriptive completeness.

Collaboration


Dive into the Ruifan Li's collaboration.

Top Co-Authors

Avatar

Xiaojie Wang

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Fangxiang Feng

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Peng Lu

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Ibrar Ahmad

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Yuzhao Mao

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Caixia Yuan

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Chang Zhou

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Haoyu Liang

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge