Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhineng Chen is active.

Publication


Featured researches published by Zhineng Chen.


international world wide web conferences | 2010

Context-oriented web video tag recommendation

Zhineng Chen; Juan Cao; Yicheng Song; Junbo Guo; Yongdong Zhang; Jintao Li

Tag recommendation is a common way to enrich the textual annotation of multimedia contents. However, state-of-the-art recommendation methods are built upon the pair-wised tag relevance, which hardly capture the context of the web video, i.e., when who are doing what at where. In this paper we propose the context-oriented tag recommendation (CtextR) approach, which expands tags for web videos under the context-consistent constraint. Given a web video, CtextR first collects the multi-form WWW resources describing the same event with the video, which produce an informative and consistent context; and then, the tag recommendation is conducted based on the obtained context. Experiments on an 80,031 web video collection show CtextR recommends various relevant tags to web videos. Moreover, the enriched tags improve the performance of web video categorization.


Multimedia Tools and Applications | 2011

Web video retagging

Zhineng Chen; Juan Cao; Tian Xia; Yicheng Song; Yongdong Zhang; Jintao Li

Tags associated with web videos play a crucial role in organizing and accessing large-scale video collections. However, the raw tag list (RawL) is usually incomplete, imprecise and unranked, which reduces the usability of tags. Meanwhile, compared with studies on improving the quality of web image tags, tags associated with web videos are not studied to the same extent. In this paper, we propose a novel web video tag enhancement approach called video retagging, which aims at producing the more complete, precise, and ranked retagged tag list (RetL) for web videos. Given a web video, video retagging first collect its textually and visually related neighbor videos. All tags attached to the neighbors are treated as possible relevant ones and then RetL is generated by inferring the degree of relevance of the tags from both global and video-specific perspectives, using two different graph based models. Two kinds of experiments, i.e., application-oriented video search and categorization and user-based subjective studies are carried out on a large-scale web video dataset, which demonstrate that in most cases, RetL is better than RawL in terms of completeness, precision and ranking.


acm multimedia | 2010

Web video categorization based on Wikipedia categories and content-duplicated open resources

Zhineng Chen; Juan Cao; Yicheng Song; Yongdong Zhang; Jintao Li

This paper presents a novel approach for web video categorization by leveraging Wikipedia categories (WikiCs) and open resources describing the same content as the video, i.e., content-duplicated open resources (CDORs). Note that current approaches only collect CDORs within one or a few media forms and ignore CDORs of other forms. We explore all these resources by utilizing WikiCs and commercial search engines. Given a web video, its discriminative Wikipedia concepts are first identified and classified. Then a textual query is constructed and from which CDORs are collected. Based on these CDORs, we propose to categorize web videos in the space spanned by WikiCs rather than that spanned by raw tags. Experimental results demonstrate the effectiveness of both the proposed CDOR collection method and the WikiC voting categorization algorithm. In addition, the categorization model built based on both WikiCs and CDORs achieves better performance compared with the models built based on only one of them as well as state-of-the-art approach.


international conference on image processing | 2014

Image character recognition using deep convolutional neural network learned from different languages

Jinfeng Bai; Zhineng Chen; Bailan Feng; Bo Xu

This paper proposes a shared-hidden-layer deep convolutional neural network (SHL-CNN) for image character recognition. In SHL-CNN, the hidden layers are made common across characters from different languages, performing a universal feature extraction process that aims at learning common character traits existed in different languages such as strokes, while the final softmax layer is made language dependent, trained based on characters from the destination language only. This paper is the first attempt to introduce the SHL-CNN framework to image character recognition. Under the SHL-CNN framework, we discuss several issues including architecture of the network, training of the network, from which a suitable SHL-CNN model for image character recognition is empirically learned. The effectiveness of the learned SHL-CNN is verified on both English and Chinese image character recognition tasks, showing that the SHL-CNN can reduce recognition errors by 16-30% relatively compared with models trained by characters of only one language using conventional CNN, and by 35.7% relatively compared with state-of-the-art methods. In addition, the shared hidden layers learned are also useful for unseen image character recognition tasks.


Multimedia Tools and Applications | 2017

Detecting Uyghur text in complex background images with convolutional neural network

Shancheng Fang; Hongtao Xie; Zhineng Chen; Shiai Zhu; Xiaoyan Gu; Xingyu Gao

Uyghur text detection is crucial to a variety of real-world applications, while little researches put their attention on it. In this paper, we develop an effective and efficient region-based convolutional neural network for Uyghur text detection in complex background images. The characteristics of the network include: (1) Three region proposal networks are used to improve the recall, which simultaneously utilize feature maps from different convolutional layers. (2) The overall architecture of our network is in the form of fully convolutional network, and global average pooling is applied to replace the fully connected layers in the classification and bounding box regression layers. (3) To fully utilize the baseline information, Uyghur text lines are detected directly by the network in an end-to-end fashion. Experiment results on benchmark dataset show that our method achieves an F-measure of 0.83 and detection time of 0.6 s for each image in a single K20c GPU, which is much faster than the state-of-the-art methods while keeps competitive accuracy.


international acm sigir conference on research and development in information retrieval | 2010

Multi-modal query expansion for web video search

Bailan Feng; Juan Cao; Zhineng Chen; Yongdong Zhang; Shouxun Lin

Query expansion is an effective method to improve the usability of multimedia search. Most existing multimedia search engines are able to automatically expand a list of textual query terms based on text search techniques, which can be called textual query expansion (TQE). However, the annotations (title and tag) around web videos are generally noisier for text-only query expansion and search matching. In this paper, we propose a novel multi-modal query expansion (MMQE) framework for web video search to solve the issue. Compared with traditional methods, MMQE provides a more intuitive query suggestion by transforming tex-tual query to visual presentation based on visual clustering. Paral-lel to this, MMQE can enhance the process of search matching with strong pertinence of intent-specific query by joining textual, visual and social cues from both metadata and content of videos. Experimental results on real web videos from YouTube demon-strate the effectiveness of the proposed method.


international conference on artificial neural networks | 2014

Chinese Image Character Recognition Using DNN and Machine Simulated Training Samples

Jinfeng Bai; Zhineng Chen; Bailan Feng; Bo Xu

Inspired by the success of deep neural network (DNN) models in solving challenging visual problems, this paper studies the task of Chinese Image Character Recognition (ChnICR) by leveraging DNN model and huge machine simulated training samples. To generate the samples, clean machine born Chinese characters are extracted and are plus with common variations of image characters such as changes in size, font, boldness, shift and complex backgrounds, which in total produces over 28 million character images, covering the vast majority of occurrences of Chinese character in real life images. Based on these samples, a DNN training procedure is employed to learn the appropriate Chinese character recognizer, where the width and depth of DNN, and the volume of samples are empirically discussed. Parallel to this, a holistic Chinese image text recognition system is developed. Encouraging experimental results on text from 13 TV channels demonstrate the effectiveness of the learned recognizer, from which significant performance gains are observed compared to the baseline system.


machine vision applications | 2017

Robust and parallel Uyghur text localization in complex background images

Yun Song; Jianjun Chen; Hongtao Xie; Zhineng Chen; Xingyu Gao; Xi Chen

Uyghur text localization in complex background images is a significant research for Uyghur image content analysis. In this paper, we propose a robust Uyghur text localization method in complex background images and provide a CPU–GPU heterogeneous parallelization scheme. Firstly, a multi-color-channel enhanced maximally stable extremal region is used to extract components in images, which is robust to blur and low contrast. Secondly, a two-stage component classification system is used to filter out non-text components. Finally, a component connected graph algorithm is proposed to construct text lines. Experiments on the proposed dataset demonstrate that our algorithm compares favorably with the state-of-the-art algorithms when handling Uyghur texts. Besides, the heterogeneous parallel implementation achieves 12.5 times speedup.


international conference on multimedia retrieval | 2013

A general Framework of video segmentation to logical unit based on conditional random fields

Su Xu; Bailan Feng; Zhineng Chen; Bo Xu

Segmenting video into logical units like scenes in movies and topic units in News videos is an essential prerequisite for a wide range of video related applications. In this paper, a novel approach for logical unit segmentation based on conditional random fields (CRFs) is presented. In comparison with previous approaches that handle scenes and topic units separately, the proposed approach deals with them in a general framework. Specifically, four types of shots are defined and represented by four middle-level features, i.e., shot difference, scene transition, shot theme and audio type. Then, the problem of logical unit segmentation is novelly formulated as a problem of identifying the type of shot based on the extracted features, by leveraging the CRFs model. The proposed framework effectively integrate visual, audio and contextual features, and it is able to produce ideal result for both scene and topic unit segmentation. The effectiveness of the proposed approach is verified on seven mainstream types of videos, from which average F-measures of 88% and 86% on scenes and topic units are reported respectively, illustrating that the proposed method can accurately segment logical units in different genres of videos.


conference on multimedia modeling | 2014

Video to Article Hyperlinking by Multiple Tag Property Exploration

Zhineng Chen; Bailan Feng; Hongtao Xie; Rong Zheng; Bo Xu

Showing video and article on the same page, as done by official web agencies such as CNN.com and Yahoo!, provides a practical way for convenient information digestion. However, as the absence of article, this layout is infeasible for mainstream web video repositories like YouTube. This paper investigates the problem of hyperlinking web videos to relevant articles available on the Web. Given a video, the task is accomplished by firstly identifying its contextual tags (e.g., who are doing what at where and when) and then employing a search based association to relevant articles. Specifically, we propose a multiple tag property exploration (mTagPE) approach to identify contextual tags, where tag relevance, tag clarity and tag correlation are defined and measured by leveraging visual duplicate analyses, online knowledge bases and tag co-occurrence. Then, the identification task is formulated as a random walk along a tag relation graph that smoothly integrates the three properties. The random walk aims at picking up relevant, clear and correlated tags as a set of contextual tags, which is further treated as a query to issue commercial search engines to obtain relevant articles. We have conducted experiments on a largescale web video dataset. Both objective performance evaluations and subjective user studies show the effectiveness of the proposed hyperlinking. It produces more accurate contextual tags and thus a larger number of relevant articles than other approaches.

Collaboration


Dive into the Zhineng Chen's collaboration.

Top Co-Authors

Avatar

Bailan Feng

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Hongtao Xie

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Bo Xu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Juan Cao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yongdong Zhang

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Jinfeng Bai

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jintao Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Wei Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yicheng Song

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Chong-Wah Ngo

City University of Hong Kong

View shared research outputs
Researchain Logo
Decentralizing Knowledge