Kuiyuan Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kuiyuan Yang is active.

Explore More

Publication

Featured researches published by Kuiyuan Yang.

IEEE Transactions on Multimedia | 2010

Towards a Relevant and Diverse Search of Social Images

Meng Wang; Kuiyuan Yang; Xian-Sheng Hua; Hong-Jiang Zhang

Recent years have witnessed the great success of social media websites. Tag-based image search is an important approach to accessing the image content on these websites. However, the existing ranking methods for tag-based image search frequently return results that are irrelevant or not diverse. This paper proposes a diverse relevance ranking scheme that is able to take relevance and diversity into account by exploring the content of images and their associated tags. First, it estimates the relevance scores of images with respect to the query term based on both the visual information of images and the semantic information of associated tags. Then, we estimate the semantic similarities of social images based on their tags. Based on the relevance scores and the similarities, the ranking list is generated by a greedy ordering algorithm which optimizes average diverse precision, a novel measure that is extended from the conventional average precision. Comprehensive experiments and user studies demonstrate the effectiveness of the approach. We also apply the scheme for web image search reranking, and it is shown that the diversity of search results can be enhanced while maintaining a comparable level of relevance.

computer vision and pattern recognition | 2015

The application of two-level attention models in deep convolutional neural network for fine-grained image classification

Tianjun Xiao; Yichong Xu; Kuiyuan Yang; Jiaxing Zhang; Yuxin Peng; Zheng Zhang

Fine-grained classification is challenging because categories can only be discriminated by subtle and local differences. Variances in the pose, scale or rotation usually make the problem more difficult. Most fine-grained classification systems follow the pipeline of finding foreground object or object parts (where) to extract discriminative features (what). In this paper, we propose to apply visual attention to fine-grained classification task using deep neural network. Our pipeline integrates three types of attention: the bottom-up attention that propose candidate patches, the object-level top-down attention that selects relevant patches to a certain object, and the part-level top-down attention that localizes discriminative parts. We combine these attentions to train domain-specific deep nets, then use it to improve both the what and where aspects. Importantly, we avoid using expensive annotations like bounding box or part information from end-to-end. The weak supervision constraint makes our work easier to generalize. We have verified the effectiveness of the method on the subsets of ILSVRC2012 dataset and CUB200 2011 dataset. Our pipeline delivered significant improvements and achieved the best accuracy under the weakest supervision condition. The performance is competitive against other methods that rely on additional annotations.

acm multimedia | 2014

Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification

Tianjun Xiao; Jiaxing Zhang; Kuiyuan Yang; Yuxin Peng; Zheng Zhang

Supervised learning using deep convolutional neural network has shown its promise in large-scale image classification task. As a building block, it is now well positioned to be part of a larger system that tackles real-life multimedia tasks. An unresolved issue is that such model is trained on a static snapshot of data. Instead, this paper positions the training as a continuous learning process as new classes of data arrive. A system with such capability is useful in practical scenarios, as it gradually expands its capacity to predict increasing number of new classes. It is also our attempt to address the more fundamental issue: a good learning system must deal with new knowledge that it is exposed to, much as how human do. We developed a training algorithm that grows a network not only incrementally but also hierarchically. Classes are grouped according to similarities, and self-organized into levels. The newly added capacities are divided into component models that predict coarse-grained superclasses and those return final prediction within a superclass. Importantly, all models are cloned from existing ones and can be trained in parallel. These models inherit features from existing ones and thus further speed up the learning. Our experiment points out advantages of this approach, and also yields a few important open questions.

workshop on web scale multimedia corpus | 2009

Visual tag dictionary: interpreting tags with visual words

Meng Wang; Kuiyuan Yang; Xian-Sheng Hua; Hong-Jiang Zhang

Visual-word based image representation has shown effectiveness in a wide variety of applications such as categorization, annotation and search. By detecting keypoints in images and treating their patterns as visual words, an image can be represented as a bag of visual words, which is analogous to the bag-of-words representation of text documents. In this paper, we introduce a corpus named visual tag dictionary. Unlike the conventional dictionaries that define terms with textual words, the visual tag dictionary interprets each tag with visual words. The dictionary is constructed in a fully automatic way by exploring the tagged image data on the Internet. With this dictionary, tags and images are connected via visual words and many applications can be thus facilitated. As examples, we empirically demonstrate the effectiveness of the dictionary in tag-based image search, tag ranking and image annotation.

IEEE Transactions on Multimedia | 2011

Tag Tagging: Towards More Descriptive Keywords of Image Content

Kuiyuan Yang; Xian-Sheng Hua; Meng Wang; Hong-Jiang Zhang

Tags have been demonstrated to be effective and efficient for organizing and searching social image content. However, these human-provided keywords are far from a comprehensive description of the image content, which limits their effectiveness in tag-based image search. In this paper, we propose an automatic scheme called tag tagging to supplement semantic image descriptions by associating a group of property tags with each existing tag. For example, an initial tag “tiger” may be further tagged with “white”, “stripes”, and “bottom-right” along three tag properties: color, texture, and location, respectively. In this way, the descriptive ability of the existing tags can be greatly enhanced. In the proposed scheme, a lazy learning approach is first applied to estimate the corresponding image regions of each initial tag, and then a set of property tags that correspond to six properties, including location, color, texture, size, shape, and dominance, are derived for each initial tag. These tag properties enable much more precise image search especially when certain tag properties are included in the query. The results of the empirical evaluation show that tag properties remarkably boost the performance of social image retrieval.

computer vision and pattern recognition | 2016

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

Chuang Gan; Kuiyuan Yang; Yi Yang; Tao Mei

Video concept learning often requires a large set oftraining samples. In practice, however, acquiring noise-free training labels with sufficient positive examples is very expensive. A plausible solution for training data collection is by sampling from the vast quantities of images and videos on the Web. Such a solution is motivated by the assumption that the retrieved images or videos are highly correlated with the query. Still, a number ofchallenges remain. First, Web videos are often untrimmed. Thus, only parts of the videos are relevant to the query. Second, the retrieved Web images are always highly relevant to the issued query. However, thoughtlessly utilizing the images in the video domain may even hurt the performance due to the well-known semantic drift and domain gap problems. As a result, a valid question is how Web images and videos interact for video concept learning. In this paper, we propose a Lead-Exceed Neural Network (LENN), which reinforces the training on Web images and videos in a curriculum manner. Specifically, the training proceeds by inputting frames of Web videos to obtain a network. The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos. In addition, Long Short-Term Memory (LSTM) can be applied on the trimmed videos to explore temporal information. Encouraging results are reported on UCFIOl, TRECVID 2013 and 2014 MEDTest in the context ofboth action recognition and event detection. Without using human annotated exemplars, our proposed LENN can achieve 74.4% accuracy on UCFIOI dataset.

IEEE Transactions on Multimedia | 2015

Query-Dependent Aesthetic Model With Deep Learning for Photo Quality Assessment

Xinmei Tian; Zhe Dong; Kuiyuan Yang; Tao Mei

The automatic assessment of photo quality from an aesthetic perspective is a very challenging problem. Most existing research has predominantly focused on the learning of a universal aesthetic model based on hand-crafted visual descriptors . However, this research paradigm can achieve only limited success because (1) such hand-crafted descriptors cannot well preserve abstract aesthetic properties , and (2) such a universal model cannot always capture the full diversity of visual content. To address these challenges, we propose in this paper a novel query-dependent aesthetic model with deep learning for photo quality assessment. In our method, deep aesthetic abstractions are discovered from massive images , whereas the aesthetic assessment model is learned in a query- dependent manner. Our work addresses the first problem by learning mid-level aesthetic feature abstractions via powerful deep convolutional neural networks to automatically capture the underlying aesthetic characteristics of the massive training images . Regarding the second problem, because photographers tend to employ different rules of photography for capturing different images , the aesthetic model should also be query- dependent . Specifically, given an image to be assessed, we first identify which aesthetic model should be applied for this particular image. Then, we build a unique aesthetic model of this type to assess its aesthetic quality. We conducted extensive experiments on two large-scale datasets and demonstrated that the proposed query-dependent model equipped with learned deep aesthetic abstractions significantly and consistently outperforms state-of-the-art hand-crafted feature -based and universal model-based methods.

conference on multimedia modeling | 2010

Social image search with diverse relevance ranking

Kuiyuan Yang; Meng Wang; Xian-Sheng Hua; Hong-Jiang Zhang

Recent years have witnessed the success of many online social media websites, which allow users to create and share media information as well as describe the media content with tags. However, the existing ranking approaches for tag-based image search frequently return results that are irrelevant or lack of diversity. This paper proposes a diverse relevance ranking scheme which is able to simultaneously take relevance and diversity into account. It takes advantage of both the content of images and their associated tags. First, it estimates the relevance scores of images with respect to the query term based on both the visual information of images and the semantic information of associated tags. Then we mine the semantic similarities of social images based on their tags. With the relevance scores and the similarities, the ranking list is generated by a greedy ordering algorithm which optimizes Average Diverse Precision (ADP), a novel measure that is extended from the conventional Average Precision (AP). Comprehensive experiments and user studies demonstrate the effectiveness of the approach.

acm multimedia | 2014

Bag-of-Words Based Deep Neural Network for Image Retrieval

Yalong Bai; Wei Yu; Tianjun Xiao; Chang Xu; Kuiyuan Yang; Wei-Ying Ma; Tiejun Zhao

This work targets image retrieval task hold by MSR-Bing Grand Challenge. Image retrieval is considered as a challenge task because of the gap between low-level image representation and high-level textual query representation. Recently further developed deep neural network sheds light on narrowing the gap by learning high-level image representation from raw pixels. In this paper, we proposed a bag-of-words based deep neural network for image retrieval task, which learns high-level image representation and maps images into bag-of-words space. The DNN model is trained on the large scale clickthrough data, and the relevance between query and image is measured by the cosine similarity of querys bag-of-words representation and images bag-of-words representation predicted by DNN, the visual similarity of images is computed by high-level image representation extracted via the DNN model too. Finally, PageRank algorithm is used to further improve the ranking list by considering visual similarity of images for each query. The experimental results achieved state-of-the-art performance and verified the effectiveness of our proposed method.

international world wide web conferences | 2015

Tagging Personal Photos with Transfer Deep Learning

Jianlong Fu; Tao Mei; Kuiyuan Yang; Hanqing Lu; Yong Rui

The advent of mobile devices and media cloud services has led to the unprecedented growing of personal photo collections. One of the fundamental problems in managing the increasing number of photos is automatic image tagging. Existing research has predominantly focused on tagging general Web images with a well-labelled image database, e.g., ImageNet. However, they can only achieve limited success on personal photos due to the domain gaps between personal photos and Web images. These gaps originate from the differences in semantic distribution and visual appearance. To deal with these challenges, in this paper, we present a novel transfer deep learning approach to tag personal photos. Specifically, to solve the semantic distribution gap, we have designed an ontology consisting of a hierarchical vocabulary tailored for personal photos. This ontology is mined from

Explore More